Failure in data upload for some dataflows due to auto-generated categories (with same ID but different level)
For certain data uploads strange "Key" errors happen, e.g. like this:
Failed: .Stat data request for 'TN1:DF_ENTR_EXIT_OF_BUSINESS_BY_ACTIVITY(1.0)' in 'staging:SIS-CC-stable' (ID 1050)
Your data management request has been processed, please see below the details:
Summary
Request ID: 1050
Data source: 20200807214154560.csv
Destination data space: staging:SIS-CC-stable
Dataflow: TN1:DF_ENTR_EXIT_OF_BUSINESS_BY_ACTIVITY(1.0)
User: my e-mail address
Completion status: Failed
Log messages
Date Level Message
08/07/2020 19:41:56 NOTICE The request with ID 1050 was successfully registered.
08/07/2020 19:41:57 NOTICE 1156 observations are being processed.
08/07/2020 19:41:57 WARN The processing of observations was stopped and dropped due to one or more errors. Please read the details of the error(s) to see how to resolve it/them.
08/07/2020 19:41:57 ERROR An item with the same key has already been added. Key: POP_ES
Functional analysis outcome:
It seems that the issue is related to categories (and if necessary related CategorySchemes) automatically generated by NSI WS when creating/uploading dataflows with categorisations in case the category (at the appropriate level as specified in the categorisation) doesn't exist. For instance, if a category "CAT_01_01" exists as a child of a category "CAT_01", but the uploaded categorisation wrongly uses "CAT_01_01" instead of the correct "CAT_01.CAT_01_01" then the NSI web service automatically creates an additional "CAT_01_01" category at root level.
Example: In the "SIS-CC stable" space "POP_ES" there is a duplicated category "POP_ES" in the FCSA category scheme (http://nsi-stable-siscc.redpelicans.com/rest/categoryscheme/FCSA/all/1.0.0): one at level "DEM.POP", another auto-generated one at root level.
The automated generation of categories (and if necessary related CategorySchemes) will be switched off with ticket dotstatsuite-core-sdmxri-nsi-ws#72 (closed).
The issues to be solved here are the following:
-
The fact that there are two categories with the same ID (at different hierarchy levels) in the category scheme must not be blocking the data uploads because categories have to functional usage in data uploads. -
Even if the notification message says that the upload was unsuccessful, the edited/changed data values seems still to be uploaded correctly into the DB and can be retrieved again from NSI. This behaviour is not consistent and thus not acceptable. If an error occurs and the "Completion status" is "Failed" then the uploaded data must not be stored in the DB.
Data examples:
- Excel workbook, sheet "Sheet5", generates the issue when saving the whole table to the database: Transfer_upload_error.xlsx
- Example reproducible locally: TOPICS.xml
CH1_population_Multiple.xml
px-x-0103010000_201_mapping_edd_upd.xml
px-x-0103010000_201_small_upload.xls