Implement TransformationsManager
Previous title: Add default functions for storing TransformationDAG output into files or inside DataManager
When using the TransformationDAG only for computing data (and not for plotting), one has to piggy-back on the PlotManager
and the plot_func
to store the resulting data.
Two very useful defaults would be:
- Store all final nodes of the TransformationDAG in the original
DataManager
. - Store all final nodes of the TransformationDAG as files in the
out_dir
(currently defined in thePlotManager
cfg)
Proposal
Afais, the TransformationDAG's intended use case is for creating plots. However, if one wants to use it only for "data transformation", it seems like one has to use the PlotManager
nonetheless to make the data available outside the dantro ecosystem. Using the plot functions for storing data is cumbersome:
- When using simple plot functions without decorator, the
DataManager
is not available and has to be piped through the TransformationDAG with it's own node and tag. - When extracting data from the DAG for adding it to
DataManager
or storing it into a file, the tags have to be hard-coded. However, the TransformationDAG should actually have the information on which nodes are the "final" ones. These could automatically be written to files, using the default writers for the respective data containers.
Ideas for an implementation
(very unsure about this...)
Implement a lightweight StorageManager
akin to the PlotManager
. Configuration could look like this:
---
data_dir: ~/
data_manager:
# ...
storage_manager:
raise_exc: true
out_dir: '{timestamp:}/'
default_task: to_dm # or 'to_file'
# Need some way of specifying which nodes to store,
# if not only the "terminal" ones.
# NOTE: Not specifying nodes here would lead to only
# 'std_norm' being stored in this example
task_kwargs:
store_nodes: [mean, std_norm]
eval:
task1:
use_dag: true
select:
data: data # From data_manager
transform:
- .mean: [!dag_tag data]
kwargs: {axis: 0}
tag: mean
- .std: [!dag_tag data]
kwargs: {axis: 0}
tag: std
- div: [!dag_tag std, !dag_tag mean]
tag: std_norm
Code:
dm = DataManager(data_dir, **cfg.get("data_manager", {}))
dm.load_from_cfg(load_cfg=cfg["data_manager"]["load_cfg"], print_tree=True)
sm = dtr.StorageManager(dm=dm, **cfg.get("storage_manager"))
sm.eval_from_cfg(eval_cfg=cfg.get("eval"))
Afterwards, I would expect the dm
to contain the two new nodes mean
and std_norm
:
Tree of DataManager '28947753', 4 members, 0 attributes
└┬ data <XrDataContainer, float32, …
└ task1 <XrDataContainer, float32, …
└┬ mean <XrDataContainer, float32, …
└ std_norm <XrDataContainer, float32, …