Validate data integrity when syncing epics to work items
Problem statement
As part of the migration of epics to work items, we have the following mechanisms to sync the data:
- Create a work item on creation of an epic
- Update the work item once an epic gets updated
- Migrate all existing epics to a work item in a background migration
When all three parts are rolled out and migrations are completed, the database should be in a state where the data of epics and work items is the same. However, how can we validate this and be sure we can move to only use work items to write to?
How to validate
1. Data normalizer
We first need to write a normalizer that brings the data of the work item and the epics in one comparable format.
2. Compare normalized data
We need a way to validate each epic and work item. There are multiple ways when to do this:
1. When backfilling the data:
In the background migration, we'd validate each work item and epic after the backfill got completed.
- Pro: We anyway need to perform the backfill and therefore query each epic already.
- Con: It does not guarantee that syncing updates work
2. Periodical checks
We'd schedule a job once per week that goes through all epics and work items and validates the data integrity.
- Pro: This would guarantee us that the backfill and the updates work
- Con: It is more effort and puts some extra load on our database. It's more complex to build, especially considering sidekiq jobs timeout quickly and we'd need to spawn multiple ones to batch the validation.
3. Event based checks
We'd spawn an event every time the data gets synced due to an update or creation of an epic and validates it. This solution needs to be combined with the check when backfilling the data, because we'd otherwise miss checking the initially backfilled data.
- Pro: Only checks the integrity when necessary
- Con: We'd spawn lots of extra sidekiq events for each update, which puts unnecessary pressure on our background job system.
3. Log errors
Once we find a mismatch we need to log these errors with the epic ID and the data attributes that mismatched. Either we report a Sentry error or just log an error and build a dashboard for these error logs.
GitLab.com vs Self Managed
For GitLab.com it's still easier compared to Self Managed instances where we can't inspect the errors from the validation. One option would be to use Service Ping to report if there is any integration error.