SDMX-CSV 2.0.0 (meta)data upload - part 1
As Gyorgy,
I want to upload data (values for measure and attributes) and values for referential metadata attributes (defined through MSD-link in DSD) using the SDMX-CSV 2.0.0 data message format,
So that I can start storing and managing referential metadata in .Stat Core.
Technical approach
The technical specification to be used are here: https://github.com/sdmx-twg/sdmx-csv/blob/sdmx3.0.0/data-message/docs/sdmx-csv-field-guide.md
- Instead of using full SDMX 3.0.0 for structures and data, we use a hybrid approach based on existing structure artefact versions (2.1) and only adding the required new data message versions of SDMX 3.0.0. This approach allows for a faster implementation while still assuring the respect of the SDMX standard.
- We use the current DSD 2.1 version with an annotation of type "Metadata" linking in its title to an MSD 2.1 artefact (through a URN).
<common:Annotations> <common:Annotation> <common:AnnotationTitle>urn:sdmx:org.sdmx.infomodel.metadatastructure.MetadataStructure=OECD:MSD_TEST(1.0)</common:AnnotationTitle> <common:AnnotationType>Metadata</common:AnnotationType> </common:Annotation> </common:Annotations>
- The new SDMX-CSV (meta)data message 2.0 reader must allow omitting any component column incl. dimension columns. This is the case, when the data file doesn't contain observation-level measures or attributes, but only attributes attached at higher levels.
Notes
- The new SDMX-CSV 2.0.0 data message format can be recognised by the new key word defined in the cell (first row, first column): STRUCTURE. This term can be extended with a sub-field delimiter encapsulated in squared brackets "[]", e.g. "STRUCTURE[;]".
- The new SDMX-CSV 2.0.0 data message format includes an optional action column. Files without that column should be accepted in .Stat imports, and are interpreted as a "Append" action. It corresponds to the current way of imports/transfers. The "Information" action is interpreted also as "Append" action. The import thus will support the 2 actions "Information" and "Append". The actions "Replace" and "Delete" doesn't need to be supported yet, as it is to be implemented in #127 (closed).
- The special features for multiple values (for metadata attributes this means multiple attribute instances, but each instance can only have one value), nested metadata attributes, non-coded multi-lingual values, non-coded multi-lingual multiple values and non-coded single or multiple XHTML values are to be supported for those special metadata attributes defined through the MSD link. Note that later in full SDMX 3.0.0 also normal DSD attributes and measures will be allowed to have such type of values.
- The order of the first columns (structure, structure id, action, keys - if requested) is fixed, but all following component columns can be in any order.
- The implementation includes adding the efficient and performing storage of the new metadata attribute values. It will probably be necessary to decide if the current attribute storage is being extended, if a new special metadata attribute storage is being implemented or if all attributes (normal and metadata attributes) will be stored in the new way. Note that the afterwards implemented query mechanism (see dotstatsuite-core-sdmxri-nsi-ws#150 (closed)) should allow to return only a subset of attributes and that only with the required dimensions. A systematic extraction of all attribute values at series and observation level has been abandoned in SDMX 3.0.0. This might have significant consequences for the most efficient attribute storage approach.
- Referential metadata defined by MSD 3.0.0 and Metadataflow 3.0.0 attached to (maintainable) structural artefacts, and uploadable through the SDMX-CSV 2.0.0 metadata message format, as defined here https://github.com/sdmx-twg/sdmx-csv/blob/sdmx3.0.0/metadata-message/docs/sdmx-csv-field-guide.md, is not part of this ticket, and will be implemented only later. However, the DB design should already take that future need into consideration.
Example
Data structures: https://nsi-qa-stable.siscc.org/rest/dataflow/UNSD/DF_JENS_DAILY/1.0?references=all&detail=referencepartial DAILY_DSD.xml
Metadata structure definition: https://nsi-qa-stable.siscc.org/rest/metadatastructure/OECD/MSD_TEST/1.0 OECD+MSD_TEST+1.0.xml
Note that there is currently a bug in the xml writer: it doesn't write the isMultiLingual
property in TextFormat:
<structure:MetadataAttribute id="STRING_MULTILANG_TYPE" minOccurs="1" maxOccurs="1">
<structure:ConceptIdentity>
<Ref id="STRING_MULTILANG_CONCEPT" maintainableParentID="DATATYPE_CONCEPTS_TEST" maintainableParentVersion="1.0" agencyID="IT1" package="conceptscheme" class="Concept" />
</structure:ConceptIdentity>
<structure:LocalRepresentation>
<structure:TextFormat textType="String" isMultiLingual="true"/>
</structure:LocalRepresentation>
</structure:MetadataAttribute>
Data file: https://nsi-qa-stable.siscc.org/rest/data/UNSD,DF_JENS_DAILY,1.0/all?dimensionAtObservation=AllDimensions UNSD-DF_JENS_DAILY-1.0-data.csv
Metadata files:
Note: The hybrid metadata linkage approach doesn't take yet specific attachment levels for DSD-linked metadata attribute into account. Such metadata attributes can be attached at any level.