Integrate Collation Checker with Admin UI for Detecting PostgreSQL Index Corruption
Overview
This issue tracks the implementation of an enhanced PostgreSQL collation corruption detection system in GitLab's admin interface. The system will detect index corruption that can occur after OS upgrades due to collation changes, which particularly affects self-managed instances.
Problem Statement
When self-managed GitLab instances upgrade their operating system, the underlying PostgreSQL database can experience index corruption due to changes in collation rules provided by libc. This corruption can cause:
- Silent corruption of unique index constraints allowing duplicate entries
- Incorrect query results or performance degradation
Proposed Solution
- Implement a multi-tiered approach that:
- Checks for collation version mismatches (informational only)
-
amcheck
has been deferred for now. see comment: !198071 (comment 2644318043)- ~~Directly validates index structure using ~~
amcheck
-
We do this because simply looking for version mismatch may not give full information, if user has already ranREFRESH COLLATION
or sometimes the default version mayNULL
as stored version, which is open to interpretation.
-
- ~~Directly validates index structure using ~~
- Checks for duplicates in curated list of unique indexes
- Add an admin UI integration that allows administrators to run checks and view results.
Implementation Details
-
Enhanced CollationChecker
-
Update to detect
both structural corruption andduplicates -
Use proper transaction management for
connection
object reuse -
Organize results by database present (For instance: main, ci)
-
Show corrupted index information:
# Streamlined index info for UI display corruption_info = { 'index_name' => 'index_projects_on_name', # Essential for identification 'table_name' => 'projects', # Context for the index 'affected_columns' => 'column1, column2', # Columns with potential issues 'index_type' => 'btree', # Only btree indexes are affected 'is_unique' => true, # Critical for prioritization 'size_bytes' => 1234567, # Helps assess rebuild impact 'corruption_types' => ['structural', 'duplicates'], # Specific issues found 'needs_deduplication' => true # Action guidance }
-
Show collation mismatch information in results:
# Database-level results including collation mismatches results = { 'main' => { 'collation_mismatches' => [ { 'collation_name' => 'en_US.UTF-8', 'provider' => 'c', 'stored_version' => '2.28', 'actual_version' => '2.31' } ], 'corrupted_indexes' => [ { 'index_name' => 'index_projects_on_name', 'table_name' => 'projects', 'affected_columns' => 'name', 'index_type' => 'btree', 'is_unique' => true, 'size_bytes' => 1234567, 'corruption_types' => ['structural'], 'needs_deduplication' => false }, { 'index_name' => 'index_users_on_name', 'table_name' => 'users', 'affected_columns' => 'name', 'index_type' => 'btree', 'is_unique' => true, 'size_bytes' => 5678901, 'corruption_types' => ['structural', 'duplicates'], 'needs_deduplication' => true } ] }, 'ci' => { 'collation_mismatches' => [], 'corrupted_indexes' => [ { 'index_name' => 'index_ci_builds_on_name', 'table_name' => 'ci_builds', 'affected_columns' => 'name', 'index_type' => 'btree', 'is_unique' => false, 'size_bytes' => 9876543, 'corruption_types' => ['structural'], 'needs_deduplication' => false } ] } }
-
-
Admin UI Integration
-
Add Database Diagnostics menu item to admin sidebar under Monitoring menu
-
Run Database Diagnostics will enqueue a background job to create generate CollationChecker results and store in Redis with 1-week or day expiration (TBD on the TTL)
-
Show collation mismatch information when present (with version differences)
-
Display index corruption details and remediation steps by database
- Only link to documentation or contact support as remedy requires data modification
-
If no result is fetched on index action, simply show
Run Diagnostics
button only -
If result is fetched on index action, show the results along with timestamp when the results was generated.
-
-
Storage Considerations
- Use Redis for storing results temporarily
- We are only spot checking curated list of indexes, starting with 2 indexes first
-
JSON size estimate with streamlined data:There are currently 900 indexes related with text column and b-treeNot all of them will be corrupted, but if we take 900 as maximumBased on Napkin Math by Claude:~300 bytes per index × 900 indexes × 2 databases = ~540 KB (worst case)
Add ~100 bytes per collation mismatch information hash (negligible increase)-
An educated guess would be around~~ so10s
indexes corruption,3-5 KB
~~Reference issue: #505982
- Set 1-week expiration for automatic cleanup
-
Performance Considerations
- System catalog queries are generally fast
- Add index size information from pg_relation_size()
- LIMIT 1 for duplicate detection allows early exit
- Execute in background job to avoid blocking UI
- Estimated runtime: 2-5 minutes for large instances
Acceptance Criteria
- Enhanced CollationChecker implementation with multi-tiered detection
-
Background worker for running checks asynchronously -
Admin UI menu item for Database Diagnostics -
Results display showing: - Collation mismatches with version information when present
- Corruption by database
- Type of corruption (structural/duplicates)
- Specific remediation steps
-
Tests for new functionality -
Documentation for administrators on interpreting results
-
Implementation Notes
- The enhanced CollationChecker returns results as a nested hash with database names as keys
- Results are stored in Redis with key "gitlab:database:collation_check:latest" with expires_on
- Only B-tree indexes with text columns are checked (the only ones affected)
- Queries for duplicate check should be fast as they target system tables and use LIMIT 1 where possible
- Also show Collation mismatches information which can be valuable context even when not definitive indicators of corruption
- We will loop through all B-tree indexes for text column, see point 2