7.2.A Data DELETE operation
SDMX messages can contain specific actions to perform against the destination database. The task is to change the transfer-service import process to process separately incremental updates (action: legacy "A" for Append and new "M" for Merge) and incremental deletes (action: "D" for Delete). Full replacements will be done later if required. This change is required to support machine to machine communications, including internal and external data space transfers. It could also contribute to improve the performance of data imports/transfers if possible.
- For SDMX-ML imports, the action is set for each dataset.
- For SDMX-CSV 2.0 imports, a new
ACTION
column (after the structure identification column) is available. The action is set for each individual row (observation or attribute(s)).
For SDMX-CSV 1.0 imports, which do not have an action column, the default action is "Information". If such data messages are loaded into an SDMX database, the action "A" (Append)/"M" (Merge) is assumed. -
For EDD+Excel, a new (virtual)to be done laterACTION
attribute can be added by the user at any level (observation, partial key, dataflow/dsd).
SDMX (3.0.0) defines the following actions for data messages (which are currently being reviewed by the SDMX-TWG):
- "I": Information - Data is for information purposes. If such data messages are loaded into an SDMX database, the action "A" (Append) is assumed.
- "A": Append - Data is for an incremental update of existing observations or partial-key attributes or for the provision of new data formerly absent. This means that only the information provided explicitly in the message should be altered. Any measure or attribute value that is to be added or changed must be provided. However, the absence of an observation value or a data attribute at any level does not imply deletion; instead it is simply implied that the value is to remain unchanged. Therefore, it is valid and acceptable to send a data message with an action of Append which (in addition to identifying structure columns) contains only identifying dimensions with some attribute values. In this case, whatever the attachment level of the attributes is, the values for the attributes will be updated. Note that it is not permissible to update measure or attribute values using incomplete identification information, e.g. without the structure ID or without the necessary dimensions (full key for measures, full key/partial key/none for attributes).
- "R": Replace - Data is for replacement. Existing observations are to be fully replaced. Existing attribute values are to be replaced. Observations or attribute values formally absent will be appended.
- "D": Delete - Data is to be deleted. 'Delete' is assumed to be an incremental deletion. The deletion is to take place at the lowest level of detail provided. Concretely, if a 'Delete' set/row only contains the identification information of the structural artefact (dataflow, data structure definition or metadata provision agreement) without any dimension, measure and attribute values then all data for the given artefact will be deleted. If the set/row contains only the structure identification and partial dimension values then all observations and all attribute values relating to those dimension values will be deleted. If the set/row contains only the structure identification, partial dimension values as well as values for some of the related attributes then only these attribute values will be deleted. If the set/row contains only the structure identification and full dimension values then the related observation and all its observation-level attribute values will be deleted. Finally, if the set/row contains only the structure identification, full dimension values as well as values for some of the related measures and attributes then only these measure and observation-level attribute values will be deleted. To be deleted measure and attribute values must be non-empty, e.g. marked with the dash character "-".
- For convenience, if this column is absent then the action "Information" is assumed.
.Stat CORE specific:
- "Data" in the former list refers to observation values, attribute values and referential metadata values. However, to simplify, in .Stat Suite data deletes do not act on referential metadata. A separate upload is necessary for ref. metadata.
Therefore, whenever the data message contains observation values or attribute values, then only those are deleted. Referential metadata needed to be deleted through a separate data message, see: #352 (closed). - In addition to "I" (information) and "A" (Append) for legacy reasons, also the new "M" (Merge) action keyword should be supported. All 3 would execute the current incremental data updates. "R" (Replace) will be implemented later.
Delete rules for SDMX-XML and SDMX-CSV to be supported in .Stat CORE:
- D: Delete actions are assumed to take place at the lowest level of detail provided in the message:
- Dimension can be omitted. This means that the value of this dimension doesn't matter (SQL notion: any incl. NULL). With this, whole datasets, any slices of whole observations for dimension groups such as time series or individual whole observations can be deleted.
- Apart from dimensions, whenever only specific values for measures or attributes are provided then only these values are deleted (within the given dimensional scope), otherwise all non-dimension component values related to the given dimensional scope are deleted. Instead of real values for non-dimensional components, it is sufficient to use any valid place-holder or "missing" value.
- It is admissible to save delete actions as is in an SDMX database and return them as is through the GET queries when using the updatedAfter or includeHistory URL parameters.
- Deleting a non-existing observation or attribute is not resulting in an error.
- Referential metadata are to be deleted separately, see #352 (closed)
- All observations and attributes impacted by the delete action change their time stamp.
- The following convention is used to indicate presence and omission of components in delete actions:
Format | Dimension value is | Measure or attribute value is | |
---|---|---|---|
omitted | omitted | present | |
Dimension value doesn't matter (SQL notion: any incl. NULL). | To be deleted only if all non-dimension component values related to the given dimensional scope are omitted. | To be deleted. | |
XML | xml element/attribute is absent | xml element/attribute is absent | Any valid value or Numeric: NaN, String: <empty> |
CSV | <empty> or column is absent | <empty> or column is absent | Any valid value or #N/A or * |
"Merge" rules for SDMX-XML and SDMX-CSV to be supported in .Stat CORE: This part has been moved, and will be implemented in this other ticket
Mix actions in the same import file
- SDMX-CSV file can contain multiple actions (delete, merge, information, append) and the transfer service should be able to process all of the contents.
- In SDMX-CSV there cannot be multiple rows which target the same action for the same coordinate or sub-coordinate.
Implementation details
Both importable SDMX data message formats (SDMX-ML, SDMX-CSV) must support these actions.
The implementation covers:
-
Add a new transfer API version -
Change the import process and validation for data and attributes -
Extend the internal (space-to-space) transfer process for data and attributes to replicate all relevant deleted observations - Does this needs the updatedAfter parameter implementation ?To be implemented in this ticket -
Produce the SQL scripts to update existing databases -
Update the data database model documentation
Technical constraints:
- In the first implementation it is ok to limit one upload to 1 action, however once the updatedAfter parameter is being implemented, it is necessary to accept both DELETE and APPEND/MERGE actions in the same input message.
Examples
Click here to open examples
DSD+MSD components:
- dimensions: DIM_1,DIM_2,TIME_PERIOD
- measures: OBS_VALUE
- attributes: DF_ATTR (defined at dataflow level), GR_ATTR, (defined at group level: DIM_2), TS_ATTR (defined at time series level), OBS_ATTR (defined at observation level)
Files:
- structures: OECD-DF_TEST_DELETE-1.0-all_structures.xml
- all data: OECD-DF_TEST_DELETE-1.0-all_data.csv
Delete | STRUCTURE | STRUCTURE_ID | ACTION | DIM_1 | DIM_2 | TIME_PERIOD | OBS_VALUE | OBS_ATTR | TS_ATTR | GR_ATTR | DF_ATTR |
---|---|---|---|---|---|---|---|---|---|---|---|
- whole content of the dataflow | dataflow | OECD:DF_TEST_DELETE(1.0) | D | ||||||||
- all observations of the dataflow and the observation-level attributes | dataflow | OECD:DF_TEST_DELETE(1.0) | D | * | * | ||||||
- all observation values of the dataflow | dataflow | OECD:DF_TEST_DELETE(1.0) | D | * | |||||||
- all attributes of the dataflow | dataflow | OECD:DF_TEST_DELETE(1.0) | D | * | * | * | * | ||||
- attributes attached at dataflow level | dataflow | OECD:DF_TEST_DELETE(1.0) | D | * | |||||||
- everything related to DIM_2=B | dataflow | OECD:DF_TEST_DELETE(1.0) | D | B | |||||||
dataflow | OECD:DF_TEST_DELETE(1.0) | D | B | * | * | * | * | ||||
- everything attached to DIM_2=B | dataflow | OECD:DF_TEST_DELETE(1.0) | D | B | * | ||||||
- all observations (and its observation-level attributes) related to DIM_2=B | dataflow | OECD:DF_TEST_DELETE(1.0) | D | B | * | * | |||||
- all time series attributes | dataflow | OECD:DF_TEST_DELETE(1.0) | D | * | |||||||
- whole time series DIM_1=A,DIM_2=B | dataflow | OECD:DF_TEST_DELETE(1.0) | D | A | B | ||||||
- the attributes attached to time series DIM_1=A,DIM_2=B | dataflow | OECD:DF_TEST_DELETE(1.0) | D | A | B | * | |||||
- observation values & observation-level attributes for time series DIM_1=A,DIM_2=B | dataflow | OECD:DF_TEST_DELETE(1.0) | D | A | B | * | * | ||||
- observation (and its observation-level attributes) attached to key DIM_1=A,DIM_2=B,TIME_PERIOD=2021 | dataflow | OECD:DF_TEST_DELETE(1.0) | D | A | B | 2021 | |||||
- observation value attached to key DIM_1=A,DIM_2=B,TIME_PERIOD=2021 | dataflow | OECD:DF_TEST_DELETE(1.0) | D | A | B | 2021 | * |
Excel version: Delete_action_examples.xlsx
1) delete whole content of the dataflow
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,,,,,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_1__delete_whole_content_of_the_dataflow.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_1__delete_whole_content_of_the_dataflow.xml
2) delete all observations of the dataflow and the observation-level attributes
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,,,*,*,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_2__delete_all_observations_of_the_dataflow_and_the_observation-level_attributes.csv
XML File: NOT POSSIBLE
3) delete all observation values of the dataflow
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,,,*,,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_3__delete_all_observation_values_of_the_dataflow.csv
XML File: NOT POSSIBLE
4) delete all attributes of the dataflow
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,,,,*,*,*,*
CSV File: OECD-DF_TEST_DELETE-1.0-case_4__delete_all_attributes_of_the_dataflow.csv
XML File: NOT POSSIBLE
5) delete attributes attached at dataflow level
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,,,,,,,*
CSV File: OECD-DF_TEST_DELETE-1.0-case_5__delete_attributes_attached_at_dataflow_level.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_5__delete_attributes_attached_at_dataflow_level.xml
6) delete everything related to DIM_2=B
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,B,,,,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_6-1__delete_everything_related_to_DIM_2_B_way1.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_6-1__delete_everything_related_to_DIM_2_B_way1.xml
or
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,B,,*,*,*,*,
CSV File: OECD-DF_TEST_DELETE-1.0-case_6-2__delete_everything_related_to_DIM_2_B_way2.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_6-2__delete_everything_related_to_DIM_2_B_way2.xml
7) delete everything attached to DIM_2=B
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,B,,,,,*,
CSV File: OECD-DF_TEST_DELETE-1.0-case_7__delete_everything_attached_to_DIM_2_B.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_7__delete_everything_attached_to_DIM_2_B.xml
8) delete all observations (and its observation-level attributes) related to DIM_2=B
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,B,,*,*,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_8__delete_all_observations__and_its_observation-level_attributes__related_to_DIM_2_B.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_8__delete_all_observations__and_its_observation-level_attributes__related_to_DIM_2_B.xml
9) delete all time series attributes
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,,,,,,*,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_9__delete_all_time_series_attributes.csv
XML File: NOT POSSIBLE
10) delete whole time series DIM_1=A,DIM_2=B
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,A,B,,,,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_10__delete_whole_time_series_DIM_1_A_DIM_2_B.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_10__delete_whole_time_series_DIM_1_A_DIM_2_B.xml
11) delete the attributes attached to time series DIM_1=A,DIM_2=B
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,A,B,,,,*,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_11__delete_the_attributes_attached_to_time_series_DIM_1_A_DIM_2_B.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_11__delete_the_attributes_attached_to_time_series_DIM_1_A_DIM_2_B.xml
12) delete observation values & observation-level attributes for time series DIM_1=A,DIM_2=B
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,A,B,,*,*,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_12__delete_observation_values___observation-level_attributes_for_time_series_DIM_1_A_DIM_2_B.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_12__delete_observation_values___observation-level_attributes_for_time_series_DIM_1_A_DIM_2_B.xml
13) delete observation (and its observation-level attributes) attached to key DIM_1=A,DIM_2=B,TIME_PERIOD=2021
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,A,B,2021,,,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_13__delete_observation__and_its_observation-level_attributes__attached_to_key_DIM_1_A_DIM_2_B_TIME_PERIOD_2021.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_13__delete_observation__and_its_observation-level_attributes__attached_to_key_DIM_1_A_DIM_2_B_TIME_PERIOD_2021.xml
14) delete observation value attached to key DIM_1=A,DIM_2=B,TIME_PERIOD=2021
STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,TIME_PERIOD,OBS_VALUE,OBS_ATTR,TS_ATTR,GR_ATTR,DF_ATTR
dataflow,OECD:DF_TEST_DELETE(1.0),D,A,B,2021,*,,,,
CSV File: OECD-DF_TEST_DELETE-1.0-case_14__delete_observation_value_attached_to_key_DIM_1_A_DIM_2_B_TIME_PERIOD_2021.csv
XML File: OECD-DF_TEST_DELETE-1.0-case_14__delete_observation_value_attached_to_key_DIM_1_A_DIM_2_B_TIME_PERIOD_2021.xml
And last, here is an example file with mixed actions
Technical Notes
MS SQL Server now provides an in-built time machine through "temporal tables":
See: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
This out-of-the-box feature allows managing deleted data states. It will also be required for features #30 (closed) amd dotstatsuite-core-sdmxri-nsi-ws#57 (closed)
.Stat Core permissions
Eurostat originally has foreseen 3 different permission related to data changes:
-
CanImportData
: Allows adding new embargoed and non-embargoed data (Note: In .Stat Suite this permission currently also allows updating existing embargoed and non-embargoed data) -
CanUpdateData
: Allows updating existing embargoed and non-embargoed data (Note: In .Stat Suite this permission is currently ineffective) -
CanDeleteData
: Allows deleting embargoed and non-embargoed data
Since currently CanUpdateData
is unusable, it should be looked into if the original separation (as done also for structures) can be respected in .Stat Core.
Validations
The following 3 advanced validations will be removed in this ticket, and re-introduced with the adjustments described in this other ticket:
- MandatoryDatasetAttributeWithNullValueInDatabase
- MandatoryAttributeWithNullValueInDatabase
- MultipleValuesForDimGroupAttributeInDatabase