Change logic to calculate ActualCC during data imports
The step to re-calculate the actual content constraint during the data import/transfer process, accounts for ~ 30% of the total time. As discussed with the team, the SQL query can be improved by removing the joins with the FILTER table (for non time dimensions).
In the past, this join was required, because when deletions were done, the Filter table remained intact. This meant that some series might not have observations (were deleted).
As discussed with the team, the options could be:
-
maintain different FILTER table versions, one per fact table version with non-deleted series (A/B sets) and one per fact table version with deleted series (A/B sets).
-
include an extra column in the FILTER table, indicating if the series is used
- in A table with non-deleted values
- in A table only with deleted values
- in B table with non-deleted values
- in B table only with deleted values
- in A and B table with non-deleted values
- in A and B table only with deleted values
- in A table with non-deleted values and in B table with only deleted values
- in B table with non-deleted values and in A table with only deleted values There is also the possibility to defer this step, and run it by a background task.
-
Analyse/implement the best approach. -
Include the SQL migration scripts if needed.
THE CHANGE MIGHT NOT BE REQUIRED
if the link ticket can replace this functionality. if this is the case:
-
Make the the step to calculate the actual content constraint optional (default -> no), during the data imports. The purpose is to avoid unnecessary re-calculations if the user intends to quickly upload other data. However, this impacts the data integrity of the system and would give wrong information to client who use the ActualCC.--> JENS: From PM side, this doesn't seem like a useful option, because users are not supposed to know anything about the ActualCC.
ALSO CHECK HOW THE IMPLEMENTATION IS LINKED TO THIS TICKET:
#337