loading dataset from LL1/124_153_svhn_cropped_dataset more than 1 hour
When trying to running our system on this dataset, it turned out the whole 1 hour was spent on loading the dataset with D3MDatasetLoader().load.
I took a look and it stuck here:
def list_files(base_directory: str)
with the stuck step on
os.walk(base_directory)
It seems that because too many files (around 630k files), the os.walk works very very slow. Maybe we should find out some other functions to replace os.walk