6.0 Allow for non-coded non-time dimensions in .Stat CORE - microdata enhancement
The objective of this feature is to enable the use of **non-coded numeric or non-numeric values for non-time dimensions in .Stat CORE.
The DSD contains the representation defined for the dimensions. For more information about the required representations, see the rules below.
-
Check if the NSI structure upload (for DSD) allows defining the required SDMX representations (see below) for non-coded numeric or non-numeric dimensions and correct if necessary. -
Check that the NSI data extractions also still work as expected and correct if necessary. - [-]
Use DSD annotationMAXTEXTDIMENSIONLENGTH
with the corresponding default value configMaxTextDimensionLength
(set to 150 by default) to optimise the SQL table storage size. -
Use the maxLength
property of the dimension'sTextFormat
representation to adapt the SQL table storage size (like currently done for the measure). -
Change transfer-service import, transfer and storage to allow the below mentioned, allowed SDMX types for dimension values. The entering values must respect the defined representation (related in-depth validations can be done in a separate ticket later). For import and extraction performance and/or db size optimisation, the db storage can be made more generic and does not need to respect the SDMX representations as long as the extracted values are still valid. -
Provide SQL script to migrate existing users. -
Update data database model documentation. -
PM: Document hard limit of the maximum characters length for non-coded dimensions of 4000. (The limit being imposed by SQL server for searchable columns (as used for all dimensions) that participate in the PRIMARY KEY).
Note that the validation of the respect of the representations goes together with the new Validation methods implemented in #123 (closed).
Check if this ticket should be implemented together with dotstatsuite-core-sdmxri-nsi-ws#303 (closed).
SDMX standard for possible types (representations) for dimension representations
TextFormat: TextFormat describes an uncoded textual format.
-
The possible values for its optional attribute "textType" are:
- "String": A string datatype corresponding to W3C XML Schema's xs:string datatype (default)
- "Alpha": A string datatype which only allows for the simple alphabetic character set of A-Z, a-z.
- "AlphaNumeric": A string datatype which only allows for the simple alphabetic character set of A-Z, a-z plus the simple numeric character set of 0-9.
- "Numeric": A string datatype which only allows for the simple numeric character set of 0-9. This format is not treated as an integer, and therefore can having leading zeros.
- "BigInteger": An integer datatype corresponding to W3C XML Schema's xs:integer datatype.
- "Integer": An integer datatype corresponding to W3C XML Schema's xs:int datatype.
- "Long": A numeric datatype corresponding to W3C XML Schema's xs:long datatype.
- "Short": A numeric datatype corresponding to W3C XML Schema's xs:short datatype.
- "Decimal": A numeric datatype corresponding to W3C XML Schema's xs:decimal datatype.
- "Float": A numeric datatype corresponding to W3C XML Schema's xs:float datatype.
- "Double": A numeric datatype corresponding to W3C XML Schema's xs:double datatype.
- "Boolean": A datatype corresponding to W3C XML Schema's xs:boolean datatype.
- "URI": A datatype corresponding to W3C XML Schema's xs:anyURI datatype.
- "Count": A simple incrementing Integer type. The isSequence facet must be set to true, and the interval facet must be set to "1".
- "ObservationalTimePeriod": Observational time periods are the superset of all time periods in SDMX. It is the union of the standard time periods (i.e. Gregorian time periods, the reporting time periods, and date time) and a time range.
- "StandardTimePeriod": Standard time periods is a superset of distinct time period in SDMX. It is the union of the basic time periods (i.e. the Gregorian time periods and date time) and the reporting time periods.
- "BasicTimePeriod": BasicTimePeriod time periods is a superset of the Gregorian time periods and a date time.
- "GregorianTimePeriod": Gregorian time periods correspond to calendar periods and are represented in ISO-8601 formats. This is the union of the year, year month, and date formats.
- "GregorianYear": A Gregorian time period corresponding to W3C XML Schema's xs:gYear datatype, which is based on ISO-8601.
- "GregorianYearMonth": A time datatype corresponding to W3C XML Schema's xs:gYearMonth datatype, which is based on ISO-8601.
- "GregorianDay": A time datatype corresponding to W3C XML Schema's xs:date datatype, which is based on ISO-8601.
- "ReportingTimePeriod": Reporting time periods represent periods of a standard length within a reporting year, where to start of the year (defined as a month and day) must be defined elsewhere or it is assumed to be January 1. This is the union of the reporting year, semester, trimester, quarter, month, week, and day.
- "ReportingYear": A reporting year represents a period of 1 year (P1Y) from the start date of the reporting year. This is expressed as using the SDMX specific ReportingYearType.
- "ReportingSemester": A reporting semester represents a period of 6 months (P6M) from the start date of the reporting year. This is expressed as using the SDMX specific ReportingSemesterType.
- "ReportingTrimester": A reporting trimester represents a period of 4 months (P4M) from the start date of the reporting year. This is expressed as using the SDMX specific ReportingTrimesterType.
- "ReportingQuarter": A reporting quarter represents a period of 3 months (P3M) from the start date of the reporting year. This is expressed as using the SDMX specific ReportingQuarterType.
- "ReportingMonth": A reporting month represents a period of 1 month (P1M) from the start date of the reporting year. This is expressed as using the SDMX specific ReportingMonthType.
- "ReportingWeek": A reporting week represents a period of 7 days (P7D) from the start date of the reporting year. This is expressed as using the SDMX specific ReportingWeekType.
- "ReportingDay": A reporting day represents a period of 1 day (P1D) from the start date of the reporting year. This is expressed as using the SDMX specific ReportingDayType.
- "DateTime": A time datatype corresponding to W3C XML Schema's xs:dateTime datatype.
- "TimeRange": TimeRange defines a time period by providing a distinct start (date or date time) and a duration.
- "Month": A time datatype corresponding to W3C XML Schema's xs:gMonth datatype.
- "MonthDay": A time datatype corresponding to W3C XML Schema's xs:gMonthDay datatype.
- "Day": A time datatype corresponding to W3C XML Schema's xs:gDay datatype.
- "Time": A time datatype corresponding to W3C XML Schema's xs:time datatype.
- "Duration": A time datatype corresponding to W3C XML Schema's xs:duration datatype.
Example:
<structure:TextFormat textType="BigInteger"/>
. -
The other optional TextFormat attributes, which .Stat Suite should support, are:
- "minLength" type="xs:positiveInteger": The minLength attribute specifies the minimum and length of the value in characters.
- "maxLength" type="xs:positiveInteger": The maxLength attribute specifies the maximum length of the value in characters.
- "minValue" type="xs:integer": The minValue attribute is used for inclusive and exclusive ranges, indicating what the lower bound of the range is. If this is used with an inclusive range, a valid value will be greater than or equal to the value specified here. If the inclusive and exclusive data type is not specified (e.g. this facet is used with an integer data type), the value is assumed to be inclusive.
- "maxValue" type="xs:integer": The maxValue attribute is used for inclusive and exclusive ranges, indicating what the upper bound of the range is. If this is used with an inclusive range, a valid value will be less than or equal to the value specified here. If the inclusive and exclusive data type is not specified (e.g. this facet is used with an integer data type), the value is assumed to be inclusive.
- "decimals" type="xs:positiveInteger": The decimals attribute indicates the number of characters allowed after the decimal separator.
- "pattern" type="xs:string": The pattern attribute holds any regular expression permitted in the similar facet in W3C XML Schema.
Example:
<structure:TextFormat textType="String" minLength="15" maxLength="15" pattern="/^[A-Z]$/"/>
Other info (features not required)
-
Note that the following other values for its optional attribute "textType" do not need to be supported:
- "InclusiveValueRange": This value indicates that the startValue and endValue attributes provide the inclusive boundaries of a numeric range of type xs:decimal.
- "ExclusiveValueRange": This value indicates that the startValue and endValue attributes provide the exclusive boundaries of a numeric range, of type xs:decimal.
- "Incremental: This value indicates that the value increments according to the value provided in the interval facet, and has a true value for the isSequence facet.
-
Note that the following other optional TextFormat attributes do not need to be supported.
- "isSequence" type="xs:boolean": The isSequence attribute indicates whether the values are intended to be ordered, and it may work in combination with the interval, startValue, and endValue attributes or the timeInterval, startTime, and endTime, attributes. If this attribute holds a value of true, a start value or time and a numeric or time interval must supplied. If an end value is not given, then the sequence continues indefinitely.
- "interval" type="xs:integer": The interval attribute specifies the permitted interval (increment) in a sequence. In order for this to be used, the isSequence attribute must have a value of true.
- "startValue" type="xs:integer": The startValue attribute is used in conjunction with the isSequence and interval attributes (which must be set in order to use this attribute). This attribute is used for a numeric sequence, and indicates the starting point of the sequence. This value is mandatory for a numeric sequence to be expressed.
- "endValue" type="xs:integer": The endValue attribute is used in conjunction with the isSequence and interval attributes (which must be set in order to use this attribute). This attribute is used for a numeric sequence, and indicates that ending point (if any) of the sequence.
- "timeInterval" type="xs:duration": The timeInterval attribute indicates the permitted duration in a time sequence. In order for this to be used, the isSequence attribute must have a value of true.
- "startTime" type="common:StandardTimePeriodType": The startTime attribute is used in conjunction with the isSequence and timeInterval attributes (which must be set in order to use this attribute). This attribute is used for a time sequence, and indicates the start time of the sequence. This value is mandatory for a time sequence to be expressed.
- "endTime" type="common:StandardTimePeriodType": The endTime attribute is used in conjunction with the isSequence and timeInterval attributes (which must be set in order to use this attribute). This attribute is used for a time sequence, and indicates that ending point (if any) of the sequence.
Enumeration (already implemented): Enumeration references a codelist that enumerates the allowable values for this representation.
-
Example:
<structure:Enumeration><URN>urn:sdmx:org.sdmx.infomodel.codelist.Codelist=MYAGENCY:CL_OBS_VALUES(1.0)</URN></structure:Enumeration>
EnumerationFormat (can for the moment be ignored by the Transfer service): EnumerationFormat describes the facets of the item scheme enumeration. Only facets and text types applicable to codes are allowed.
- This are the optional EnumerationFormat properties: "textType" (with same as possible values for attribute "textType" of above "TextFormat"), "minLength", , "maxLength", "minValue", "maxValue", "pattern", as well as "isSequence", "interval", "startValue", "endValue", "timeInterval", "startTime" and "endTime".
Example:<structure:Enumeration> <URN>urn</URN> </structure:Enumeration> <structure:EnumerationFormat textType="AlphaNumeric" minLength="1681372040" maxLength="472688068" minValue="1014747829" maxValue="-566152027" pattern="sfzOK-PED"/>
Examples:
-
CRS1 (non-remodelled) with
MD_ID
dimension being a string- Structure: crs1_Multiple.xml
- Data (full - zipped): https://oecd-my.sharepoint.com/:u:/g/personal/anastassia_samsonova_oecd_org/EQ0sOhtSx7NLq_FmIQllkToBNeTW0HuuwINrp-tJ36ACrA?e=RIGeIQ
- Data (small sample): OECD-DF_CRS1-1.0-data_sample.csv
-
CRS1 (non-remodelled) with
MD_ID
dimension being an integer- Structure crs1_Multiple__1_.xml
- Data (full - zipped): same as above
- Data (small sample): same as above
Currently the data upload gives an error "Dimension MD_ID has no codelist representation. Note that non-coded dimensions are currently not supported. Recreate your data structure using only dimensions defined by codelists."