Geo: Ensure UNIQUE indexes are used for resources in registry tables on the Geo Tracking Database (Part 2)
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
This is copied over from #479852 (comment 2347745762) where we split up the work into two issues.
Summary
We have identified a racing condition on the creation of some registry entries in the Geo Tracking Database.
Additional context can be seen in this customer support issue: https://gitlab.com/gitlab-com/geo-customers/-/issues/246 (Internal Link)
What is the current bug behavior?
A racing condition can happen and is confirmed in at least two registry tables:
job_artifact_registry, which include the following indexes:
CREATE INDEX index_job_artifact_registry_on_artifact_id
ON job_artifact_registry (artifact_id);
CREATE INDEX index_job_artifact_registry_on_retry_at
ON job_artifact_registry (retry_at);
CREATE INDEX job_artifact_registry_failed_verification
ON job_artifact_registry (verification_retry_at)
WHERE ((state = 2) AND (verification_state = 3));
CREATE INDEX job_artifact_registry_needs_verification
ON job_artifact_registry (verification_state)
WHERE ((state = 2) AND (verification_state = ANY (ARRAY [0, 3])));
CREATE INDEX job_artifact_registry_pending_verification
ON job_artifact_registry (verified_at)
WHERE ((state = 2) AND (verification_state = 0));
allows for multiple entries that points to the same artifact_id .
file_registry, which include the following indexes:
create unique index file_registry_pkey
on public.file_registry (id);
create index file_registry_failed_verification
on public.file_registry (verification_retry_at)
where ((state = 2) AND (verification_state = 3));
create index file_registry_needs_verification
on public.file_registry (verification_state)
where ((state = 2) AND (verification_state = ANY (ARRAY [0, 3])));
create index file_registry_pending_verification
on public.file_registry (verified_at)
where ((state = 2) AND (verification_state = 0));
create index index_file_registry_on_retry_at
on public.file_registry (retry_at);
allows for multiple entries that points to the same file_id.
What is the expected correct behavior?
Each registry entry should always point to a single resource only.
Possible fixes
We need to create unique indexes for the tables that are missing one on the corresponding resource reference. This will prevent all downstream effects of this bad data.
Because we can't do that on a system with existing duplicates, a data migration step will need to be performed to remove duplications before creating the new indexes. Follow https://docs.gitlab.com/development/database/deduplicate_database_records/#deduplicate-strategy-for-large-tables. Note that this requires 3 releases and the last one must come after a required stop.
There are exactly 3 unique constraints that we need to add:
-
package_file_idonpackage_file_registry -
artifact_idonjob_artifact_registry -
file_idonfile_registry
Workarounds
To delete duplicate registry rows: #479131 (closed)
Implementation plan
MR 3: Delete pre-existing duplicates and add unique indexes
- Follow the documented process https://docs.gitlab.com/development/database/deduplicate_database_records/#deduplicate-strategy-for-large-tables