Support for dictionaries and nested dictionaries in dataset attributes
This is a follow up form #150 (closed)
An issue was raised up by @peendebak
One feature I would like from a storage backend, but which is not supported by h5netcdf either is nested dictionaries. E.g.
dataset.attrs['myvalue']={'y': {'z': 1, 'a': [1,2]} }
Indeed this is not supported by most engines. How often is this use-case required from your experience? How many nested levels does it usually involve?
From my developer experience with PycQED, allowing for nested dictionaries opens an entire pandora box of complications to deal with.
Would flattening the dictionary be a good enough solution?
Something close to a "proper" solution would be to simply serialize dictionaries with e.g. JSON, though this does not sound very elegant and would introduce some new conventions on the dataset format.
We make regular use of (nested) dictionaries in the dataset metadata. One thing is the station snapshot (which in quantify is stored in a separate file), but we store other data as well.
Flattening is an option, but I would like it to be build into the framework. (e.g. convert the attrs to json string and then store, and when loading convert the json back to dictionary format)
You are right that storing nested dictionaries is not easy, but it can certainly be done. Converters to json format are available. Another option would be https://pypi.org/project/hickle/, which can store nested dictionaries in hdf5.
I made an issue at h5netcdf: https://github.com/h5netcdf/h5netcdf/issues/86 to see what the possibilities are for native support in h5netcdf
From the reply Pieter got on the h5netcdf issue tracker:
Although I've never seen any python dictionary directly read or written to netcdf or hdf5 attributes, it doesn't mean it won't exist. But also the documentation over at NetCDF [1] states that "Attributes are all either scalars (single-valued) or vectors (a single, fixed dimension)" [2].
You can still solve this by storing the key,value-pairs via groups (hdf5/netcdf4), but this would involve a specific reader/writer function on your side to correctly pack/unpack your dictionary. Examples are on SO for hdf5 [3][4] which could be adapted to netcdf4 or h5netcdf.
[1] https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_data_set_components.html [2] https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_data_set_components.html#attributes) [3] https://stackoverflow.com/a/62470570 [4] https://stackoverflow.com/a/61262342