Backfill UserAddOnAssignmentVersion table with historical data from UserAddOnAssignment
Summary
Create a data migration to backfill the UserAddOnAssignmentVersion table with historical data from the UserAddOnAssignment table for assignments created before 2024-11-07. This is needed to ensure AI metrics and analytics have complete historical data.
Background
The UserAddOnAssignmentVersion model with papertrail tracking was introduced on 2024-11-07 via !169180 (merged). However, this means that add-on assignments created before this date don't have corresponding version records, which affects:
- AI Impact Analytics that rely on historical assignment data
- ClickHouse synchronization for historical metrics
- Accurate reporting of add-on usage over time
As discussed in #498633 (comment 2595465956), we need to backfill the _versions table for assignments that don't have any version records.
Proposal
-
Create a data migration that:
- Identifies
UserAddOnAssignmentrecords without correspondingUserAddOnAssignmentVersionentries - Creates
UserAddOnAssignmentVersionrecords with:event = 'created'- Proper
created_attimestamp from the original assignment - All necessary fields to maintain data integrity
- Identifies
-
Ensure proper synchronization:
- Verify that the backfilled records are properly synced to ClickHouse
- Update sync cursors if necessary to ensure complete data transfer
-
Data validation:
- Compare ClickHouse data with PostgreSQL data for add-on assignments under gitlab-org
- Ensure data accuracy and completeness after backfill
-
Documentation:
- Document the backfill process and any limitations
- Update AI Impact Analytics documentation if needed
Acceptance Criteria
-
Data migration successfully creates UserAddOnAssignmentVersionrecords for allUserAddOnAssignmentrecords created before 2024-11-07 -
Backfilled records have correct created_attimestamps andevent='created' -
All backfilled data is properly synchronized to ClickHouse -
AI metrics show consistent historical data after backfill -
Data validation confirms accuracy between PG and ClickHouse -
Documentation is updated to reflect the backfill process
Technical Notes
- Reference query for estimating records: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/41063/commands/126208
- The backfill should preserve the original
created_attimestamps from theUserAddOnAssignmenttable - Ensure the migration is safe for large datasets and can be run in production
Related Issues
- Parent issue: #498633 (closed)
- Related to ClickHouse sync issues and AI metrics accuracy