Find an abstraction for the current `HDF5` module which allows integration of `Apache Arrow` and possibly other future data systems under a common interface.
Description
Currently we only support HDF5
as data output format, though we decided to look into supporting Apache Arrow
as well. This issue represents the first concrete step into this direction by designing a common basis for HDF5
and Arrow
that should support future extensions as far as possible, and a common interface through which both can be accessed, in order to support YAML
based configuration as much as possible.
Proposal
In order to avoid breaking changes, allow future extensions and support YAML
based configuration, it would be desirable to have a common interface that abstracts away the underlying library-specific implementation of data input and output. The switch between different backends then could be accomplished with dynamic polymorphism. This, in turn, requires a common basic layer from which HDF5
, Arrow
and all other possible future data-IO systems should inherit.
Because the HDF5
backend has not been designed with this in mind, this abstraction can be expected to be more leaky than it would have been if it was present from the start. This issue exists to come up with a suitable design for this system. This will require some experimentation and probably a lot of back and forth between different ideas.
How to test the implementation?
All current tests in the utopia
main repo but also in the models
repo have to run in their current form.
Related issues
Meta-task: #372