Resolve "Implement the `DataManager`"
With this MR, the DataManager class is implemented. It takes care of loading data from a fixed data_dir and configured by a load_cfg.
Main features:
- Is associated with a data directory
- Can create a time-stamped output directory inside the data directory
- Can load data via a load configuration (see below for example configuration)
- Can control where the loaded data is stored within the data manager
- into a new group or container with the name of the configuration entry (default)
- into a container with a specified name
- into an existing or non-existing group (by path)
- Ensures that data is not silently overwritten; needs to be explicitly configured to allow this
- Gives a progress indicator
- Can control where the loaded data is stored within the data manager
- Can be easily extended by data loader mixin classes
- Gives understandable error messages if something was badly configured or went wrong
Note that, to achieve this, quite a lot of procedural code and parameters were needed in the load method and its helpers.
Loader mixin classes
This MR implements two loader mixin classes:
-
YamlLoaderMixin: for loading yaml data -
Hdf5LoaderMixin: for loading HDF5 data, recursively resolvingh5.Groupobjects toDataGroups, andh5.Datasets toDataContainers, carrying over attributes
Example load configuration
---
minimal:
loader: yaml # yml also works
glob_str: '*.yml'
# all yml files on the top level of the `data_dir` will be loaded
# into the OrderedDataGroup /minimal
all_yaml:
loader: yaml
glob_str:
- '*.yml'
- '*.yaml'
- '**/*.yml'
- '**/*.yaml'
always_create_group: true
# all files in the data_dir/ or subdirectories will be loaded
# that match any one of the four glob string. The results will be stored
# in the group /all_yaml, even if only a single file was found.
grouped_data:
loader: yaml
glob_str: 'group/*.yml'
ignore: ['group/cfg.yml']
always_create_group: true
target_group: my_group
# all yml files in data_dir/group (except group/cfg.yml) will be loaded
# into /my_group, even if it is only one file
uni_data:
loader: hdf5_proxy
globstr: 'universes/**/uni*_data.h5'
path_regex: 'uni([0-9]+)_cfg.yml'
# all hdf5 files of the given pattern will be loaded into
# the new group `/uni_data` under a regex-parsed name
# Example: uni123_data.yml will be accessible via /uni_data/123
# Additionally, the data is proxy, and will only be loaded if needed
Can this MR be accepted?
-
Implementation finished -
DataManagerclass -
Initialisation functions, directory creation, ... -
loadfunction for a single entry and helpers -
load_from_cfgfunction that can load multiple entries -
An interface to extend the class with loader mixin classes -
A decorator to declare these functions -
Useful loader mixins -
A YamlLoaderMixin -
A Hdf5LoaderMixin(depends onNumpyDataContainer, i.e. #4 (closed) and !4 (merged) )
-
-
Hdf5DataProxyand corresponding mixin class
-
-
Tests written -
Full coverage of data_mngrmodule (exceptNotImplementedErrorand impossible-to-test cases) -
Implement fixture to write output data that is then loaded in again as data_dir -
Test that results are loaded into the desired location -
Test that name clashes (upon existing data) are communicated clearly -
YamlLoaderMixintested -
Hdf5LoaderMixintested -
Proxy data working
-
-
Pipeline passing -
MR Description written
Related issues
Closes #3 (closed)
Edited by Utopia Developers