Reduce memory footprint
When loading huge amounts of data, it is not only the data itself that takes up memory, but also the Python objects that represent the data tree. Depending on the structure of the HDF5 files and the number of files, this might end up being a considerable number of objects, easily exceeding tens of millions of objects.
While evaluating a cluster run with a total output file size of 10GB and a couple of millions of datasets, the data tree alone (using solely proxy objects) took up about 14GB of memory. This is way too much.
To Do
- Find out what exactly takes up so much memory
-
dantro.utils.coords
seems to be one major point (improved in !257 (merged)) -
...
-
- Reduce object size
-
Remove unnecessary cache attributes in data tree objects (done in !257 (merged)) -
Consider using __slots__
in more places - …
-
Edited by Utopia Developers