Efficient data archiving (compress data)
As Jens,
I want to be able to compress a specific whole .Stat CORE data space,
So that in the specific context of archiving, it only uses a minimal disk space.
Technical approach
- A new configuration setting Archived (boolean, default false) controls the compression behaviour of the dataspace.
- All the existing functionality remains for Archived dataspaces.
- A dataspace can already contain data and be configured as non archived. In this case after a init/dataflow, transfer or import, the data/referential metadata will be compressed.
- Once the data/referential metadata is archived, there is no implementation to decompress it. In other words, if a dataspace configured as archived, that already contain archived data/refrerential metadata, is reconfigured to non archived, the data/referential metadata will continue to be compressed.
- Compressed dataspaces do not use unique constraints to garantee unique observations. This is achieved instead by validating duplicates while importing/transfering data and referential metadata.
Testing approach
- qa:reset will be configured as a archived dataspace.
- The metric to test the feature is the amount of disk space used by the data database to store the data/referential metadata. It is estimated to be ~20% of the original space.
- To verify this, PM can compare the value for the field "
qa:reset
->dataDb->data" and compare it to a non archived dataspace with the same data/referential metadata loaded to both, e.g.staging:SIS-CC-reset
.
- To verify this, PM can compare the value for the field "
NOTE that this feature does not include compression in the NSIWS' structure database (mappingstore db)
The tasks below will be done in this separate ticket.
### New tasks
-
To reduce further the space used by the entire data database, add compression to the following tables:[Management].[DSD_TRANSACTION][Management].[LOGS]
Edited by Pedro Carranza