New action MERGE for data imports
SDMX messages can contain specific actions to perform against the destination database. The task is to change the transfer-service import process to process separately incremental updates (action: legacy "A" for Append and new "M" for Merge) and incremental deletes (action: "D" for Delete). Full replacements will be done later if required. This change is required to support machine to machine communications, including internal and external data space transfers. It could also contribute to improve the performance of data imports/transfers if possible.
- For SDMX-ML imports, the action is set for each dataset.
- For SDMX-CSV 2.0 imports, a new
ACTION
column (after the structure identification column) is available. The action is set for each individual row (observation or attribute(s)).
For SDMX-CSV 1.0 imports, which do not have an action column, the default action is "Information". If such data messages are loaded into an SDMX database, the action "A" (Append)/"M" (Merge) is assumed. -
For EDD+Excel, a new (virtual)ACTION
attribute can be added by the user at any level (observation, partial key, dataflow/dsd). to be done later
SDMX (3.0.0) defines the following actions for data messages (which are currently being reviewed by the SDMX-TWG):
- "I": Information - Data is for information purposes. If such data messages are loaded into an SDMX database, the action "A" (Append) is assumed.
- "A": Append - Data is for an incremental update of existing observations or partial-key attributes or for the provision of new data formerly absent. This means that only the information provided explicitly in the message should be altered. Any measure or attribute value that is to be added or changed must be provided. However, the absence of an observation value or a data attribute at any level does not imply deletion; instead it is simply implied that the value is to remain unchanged. Therefore, it is valid and acceptable to send a data message with an action of Append which (in addition to identifying structure columns) contains only identifying dimensions with some attribute values. In this case, whatever the attachment level of the attributes is, the values for the attributes will be updated. Note that it is not permissible to update measure or attribute values using incomplete identification information, e.g. without the structure ID or without the necessary dimensions (full key for measures, full key/partial key/none for attributes).
- "R": Replace - Data is for replacement. Existing observations are to be fully replaced. Existing attribute values are to be replaced. Observations or attribute values formally absent will be appended.
- "D": Delete - Data is to be deleted. 'Delete' is assumed to be an incremental deletion. The deletion is to take place at the lowest level of detail provided. Concretely, if a 'Delete' set/row only contains the identification information of the structural artefact (dataflow, data structure definition or metadata provision agreement) without any dimension, measure and attribute values then all data for the given artefact will be deleted. If the set/row contains only the structure identification and partial dimension values then all observations and all attribute values relating to those dimension values will be deleted. If the set/row contains only the structure identification, partial dimension values as well as values for some of the related attributes then only these attribute values will be deleted. If the set/row contains only the structure identification and full dimension values then the related observation and all its observation-level attribute values will be deleted. Finally, if the set/row contains only the structure identification, full dimension values as well as values for some of the related measures and attributes then only these measure and observation-level attribute values will be deleted. To be deleted measure and attribute values must be non-empty, e.g. marked with the dash character "-".
- For convenience, if this column is absent then the action "Information" is assumed.
.Stat CORE specific:
- "Data" in the former list refers to observation values, attribute values and referential metadata values. However, to simplify, in .Stat Suite data deletes do not act on referential metadata. A separate upload is necessary for ref. metadata.
Therefore, whenever the data message contains observation values or attribute values, then only those are deleted. Referential metadata needed to be deleted through a separate data message, see: #352 (closed). - In addition to "I" (information) and "A" (Append) for legacy reasons, also the new "M" (Merge) action keyword should be supported. All 3 would execute the current incremental data updates. "R" (Replace) will be implemented later.
"Merge" rules for SDMX-XML and SDMX-CSV to be supported in .Stat CORE:
- M: Merge actions are assumed to take place at the lowest level of detail provided in the message, however:
- Dimensions can be omitted only when uploading higher-level attribute values that are not attached at these dimensions. Note that the attachment level of attributes is always predefined.
- Apart from dimensions, whenever only specific values for measures or attributes are provided then only these values are updated/inserted. Any not specified (omitted) value is not changed.
- The same convention as used for delete actions is used to indicate presence and omission of components in merge actions.
- It is admissible to save merge actions as is in an SDMX database and return them as is through the GET queries when using the updatedAfter or includeHistory URL parameters.
- Referential metadata are to be updated/imported separately, see #352 (closed)
- All observations and attributes impacted by the merge action change their time stamp.
- The following special measure and attribute values indicate intentionally missing values, which are to be NULLED in the database (and to be returned using these special values in extractions):
Format | Numeric values | Textual values |
---|---|---|
XML | NaN | <empty> |
CSV | #N/A | #N/A |
- The following convention is used to indicate presence and omission of components in merge actions:
Format | Dimension value is | Measure or attribute value is | |
---|---|---|---|
omitted | omitted | present | |
Behaviour | Only for higher-level attributes that are not attached to this dimension (SQL notion: IS NULL). | Not to be changed. | To be updated/inserted. |
XML | xml element/attribute is absent | xml element/attribute is absent | Any valid value or intentionally missing value (Numeric: NaN, String: <empty>) |
CSV | <empty> or column is absent | <empty> or column is absent | Any valid value or intentionally missing value (#N/A) |
Mix actions in the same import file
- SDMX-CSV file can contain multiple actions (delete, merge, information, append) and the transfer service should be able to process all of the contents.
- In SDMX-CSV there cannot be multiple rows which target the same action for the same coordinate or sub-coordinate.
Implementation details
Both importable SDMX data message formats (SDMX-ML, SDMX-CSV) must support these actions.
The implementation covers:
-
Change the import process and validation for data and attributes -
Produce the SQL scripts to update existing databases -
Update the data database model documentation
.Stat Core permissions
Eurostat originally has foreseen 3 different permission related to data changes:
-
CanImportData
: Allows adding new embargoed and non-embargoed data (Note: In .Stat Suite this permission currently also allows updating existing embargoed and non-embargoed data) -
CanUpdateData
: Allows updating existing embargoed and non-embargoed data (Note: In .Stat Suite this permission is currently ineffective) -
CanDeleteData
: Allows deleting embargoed and non-embargoed data