Fix race condition in gitlab:db:repair_index task
What does this MR do and why?
Fixes a race condition in the gitlab:db:repair_index rake task that causes intermittent failures when repairing unique indexes.
Problem: The task deletes duplicate records and then attempts to reindex, but new duplicates can be inserted between these two operations, causing the reindex to fail with unique constraint violations.
Solution: Wrap both deduplication and reindexing in a single atomic transaction using non-concurrent index operations to maintain the lock throughout the entire process.
Closes #583776
Zendesk ticket: https://gitlab.zendesk.com/agent/tickets/678309 (Internal only)
How to set up and validate locally
1. Create test duplicates
# In Rails console
pm_package = PackageMetadata::Package.create!(purl_type: 7, name: 'test-duplicate-1')
PackageMetadata::Package.create!(purl_type: 7, name: 'test-duplicate-1') # Duplicate
# Verify duplicates exist
ActiveRecord::Base.connection.execute(<<~SQL)
SELECT purl_type, name, COUNT(*) as count, ARRAY_AGG(id ORDER BY id) as ids
FROM pm_packages
WHERE purl_type = 7 AND name = 'test-duplicate-1'
GROUP BY purl_type, name
HAVING COUNT(*) > 1;
SQL
2. Test the fix
# Run the repair task
bundle exec rake gitlab:db:repair_index
# Should output:
# Processing index 'i_pm_packages_purl_type_and_name' on table 'pm_packages'...
# Index is unique. Deduplicating and reindexing atomically...
# Acquiring lock on 'pm_packages' to prevent concurrent modifications...
# Lock acquired. Proceeding with deduplication...
# Found X duplicates in 'pm_packages' for columns: purl_type,name
# Deleting duplicate records from pm_packages...
# Deleted X duplicate records from pm_packages
# Reindexing index (non-concurrent to maintain lock)...
# Index reindexed successfully (non-concurrent).
# Deduplication and reindexing completed atomically.
3. Verify no duplicates remain
SELECT purl_type, name, COUNT(*)
FROM pm_packages
GROUP BY purl_type, name
HAVING COUNT(*) > 1;
-- Should return 0 rows
4. Verify index is valid
SELECT indexname, indexdef
FROM pg_indexes
WHERE indexname = 'i_pm_packages_purl_type_and_name';
-- Should show the index exists
Technical Details
Changes Made
-
Added non-concurrent SQL templates:
-
REINDEX_SQL_NON_CONCURRENT- for reindexing inside transactions -
CREATE_INDEX_SQL_NON_CONCURRENT- for index creation inside transactions
-
-
Created
deduplicate_and_reindexmethod:- Wraps deduplication and reindexing in a single transaction
- Maintains
SHARE ROW EXCLUSIVE MODElock throughout - Prevents race condition window
-
Added helper methods:
-
reindex_index_non_concurrent- reindex without CONCURRENTLY -
create_index_non_concurrent- create index without CONCURRENTLY
-
-
Updated main
runmethod:- Calls
deduplicate_and_reindexfor unique indexes - Ensures atomic operation
- Calls
Why Non-Concurrent?
REINDEX INDEX CONCURRENTLY cannot run inside a transaction block (PostgreSQL limitation). Since we already hold a SHARE ROW EXCLUSIVE MODE lock that blocks writes, using non-concurrent operations is appropriate:
- Writes are already blocked by our lock
- We need to stay in the transaction to prevent race condition
- Non-concurrent is faster and uses less disk space
- The lock ensures atomicity
Before vs After
Before (Race Condition):
deduplicate_data(...) # Transaction ends here, lock released
# ← Window for new duplicates to be inserted
reindex_index(...) # Fails if duplicates were inserted
After (Atomic):
connection.transaction do
LOCK TABLE
deduplicate_data(...)
reindex_index_non_concurrent(...) # Lock still held
end # Lock released only here
Performance Impact
Trade-offs:
- Eliminates race condition completely
- Guarantees atomic operation
- Faster reindex (non-concurrent)
- Uses less disk space
- Blocks writes during reindex (but already blocked during deduplication)
- Should be run during maintenance window for large tables
Recommendation: For tables with < 10,000 duplicates, the impact is minimal. For larger tables, schedule during maintenance window.
Affected Tables
All unique indexes in INDEXES_TO_REPAIR:
pm_packagespm_affected_packagesmerge_request_diff_commit_userstopicsci_refsci_resource_groupsenvironmentssbom_componentstagscontainer_repositoriesaudit_events_group_external_streaming_destinationsaudit_events_instance_external_streaming_destinations
Related Issues
- #372150 (closed) - Original index repair implementation
- #523146 (closed) - Package metadata duplicate handling
Files Changed
lib/gitlab/database/repair_index.rbchangelogs/unreleased/fix-repair-index-race-condition.yml