Automated Open-readiness checks for new DSDs and Dataflow updates
There are certain checks on datasets which we would like to be automated. The following is a list of checks on new DSDs or Dataflow updates.
- No duplication of labels within a dimension [what about different hierarchy levels?]
- No blank units for any observation
- Common units only (OECD:CL_UNIT_MEASURE codelist and OECD:UNIT_MEASURE dimension or attribute is used) [This rule will need to change as the new Unit of measure system allows customisation via the annotations.]
- A dataflow-level annotation UNIT_MEASURE_CONCEPTS lists the dimension and/or attribute IDs used to define the full "Unit of measure" and which are used to auto-generate the according labels. At minimum it must contain "UNIT_MEASURE,UNIT_MULT,BASE_PER" (See: dotstatsuite-data-explorer#113 (closed)) [Unless dataflow annotations specify otherwise - the real rule should check the baselines settings for space if annotations are not provided and check whether the annotations are correct an yield proper UoM when they are provided.]
- Referential metadata at least at dataflow level in all languages defined for the Data Explorer [there is no Ref. Metadata yet.]
- Generic attributes used correctly as flags or footnotes [?]
- Non-generic attributes must have labels [?]
- SDMX time dimension and OECD:FREQ dimensions must be used
- Number of data points in dataflow [?]
- Dimension ID that is not included in the UNIT_MEASURE_CONCEPTS annotation list must not use the term “UNIT” [In case of microdata UNIT_ID could make sense, but we can start off with this restriction and make sure that DONOR_ID or similar is used]
- All structure information displayed in the DE (dataflow, dimensions, dimension values) must exist in all languages defined for the Data Explorer
The above should be provided on creation or update of a dataset, or as a report to be run on an existing dataset.
Check | Must-have (Error) | Warning/Info |
---|---|---|
No blank units for any observation | x | |
Common units only | x | |
A dataflow-level annotation UNIT_MEASURE_CONCEPTS | x | |
Common list for time dimension must be respected | x | |
Dimension IDs outside UNIT def. without “UNIT” term | x | |
SDMX time dim. and OECD:FREQ dim. must be used | x | |
No duplication of labels in a dimension | x | |
Dataflow referential metadata in all DE languages | x | |
Generic attributes used correctly | x | |
Non-generic attributes must have labels | x | |
All structure information in all DE languages | x | |
Number of data points in dataflow | x | |
Base period specified for unit 'Index' | x |
Edited by Gyorgy Gyomai