Better API to handle metadata

While I like how is currently metadata structured, with standard JSON schema and selectors which allow attaching metadata to any part of data, I must admit that I do not (yet) like how the API to manage all this metadata is. I think it is powerful that you can attach metadata to any part of the data, and especially with more complicated values in the future which will not be just tabular data of scalar values, but multi-dimensional data with potentially parts of data in various formats/subtypes. I also think it is good that metadata is immutable and when you change it you get a new object.

But the API to manage this is confusing and hard to use, both query and update. It is probably also hard because metadata is not automatically updated together with data (which will be addressed with #35 (closed)). And for tabular data this might be improved by #55 (closed). This issue is trying to collect what are common problems with the API and think about potential solutions.

ALL_ELEMENTS semantics

Currently, ALL_ELEMENTS special selector segment has a bit surprising semantics. It is used both for update and query:

  • For update, it means that all current and future values of a particular dimension will have some metadata. (Until removed or changed through another ALL_ELEMENTS update for same metadata key(s).)
  • For querying, it returns metadata only if metadata was set using ALL_ELEMENTS, but not if there is metadata which does hold for all current elements (if it was for example set on all its individual elements). On the other hand, metadata which was set using ALL_ELEMENTS is still returned, even if there is one element which does not have the same value.
  • When querying individual item, ALL_ELEMENTS is merged with individual item metadata.
  • It is important to understand that ALL_ELEMENTS is different from dimension itself. Metadata about dimension as a whole goes into parent's dimension dimension key (for example, number of elements and structural type of the dimension). ALL_ELEMENTS is something which applies to elements themselves.

This is surprising, but it is unclear what a solution would be. Automatically detecting if all elements share some metadata and move it to ALL_ELEMENTS looks a solution, but it does not work really because ALL_ELEMENTS also hold for all future elements. So if somebody else adds a new element without some metadata key, now this element will also have that metadata. Doing update with ALL_ELEMENTS is explicit about that. But imagine that I am adding one element by element, I add the first one, set its metadata, it is detected that that all elements have that metadata, so it is moved to ALL_ELEMENTS, I add another element, with some other metadata keys, and keys and values from the first element automatically apply. This is not what one would expect.

Maybe a solution could be API where you can get:

  • Values set for ALL_ELEMENTS, knowing that some elements might not have really that metadata. (This is already.)
  • Individual metadata for values, which is merged with ALL_ELEMENTS. (This is already.)
  • A query for a particular metadata key, which would return you what hold for ALL_ELEMENTS (if anything), and then all individual cases which differ. This should be relatively easy to implement. The a caller can decide if they want to inspect exceptions more, use the common case, or do something else.
Edited by Mitar