Skip to content

Integrate Collation Checker with Admin UI for Detecting PostgreSQL Index Corruption

Overview

This issue tracks the implementation of an enhanced PostgreSQL collation corruption detection system in GitLab's admin interface. The system will detect index corruption that can occur after OS upgrades due to collation changes, which particularly affects self-managed instances.

Problem Statement

When self-managed GitLab instances upgrade their operating system, the underlying PostgreSQL database can experience index corruption due to changes in collation rules provided by libc. This corruption can cause:

  • Silent corruption of unique index constraints allowing duplicate entries
  • Incorrect query results or performance degradation

Proposed Solution

  1. Implement a multi-tiered approach that:
    • Checks for collation version mismatches (informational only)
    • amcheck has been deferred for now. see comment: !198071 (comment 2644318043)
      • ~~Directly validates index structure using ~~amcheck
        • We do this because simply looking for version mismatch may not give full information, if user has already ran REFRESH COLLATION or sometimes the default version may NULL as stored version, which is open to interpretation.
    • Checks for duplicates in curated list of unique indexes
  2. Add an admin UI integration that allows administrators to run checks and view results.

Implementation Details

  1. Enhanced CollationChecker

    • Update to detect both structural corruption and duplicates

    • Use proper transaction management for connection object reuse

    • Organize results by database present (For instance: main, ci)

    • Show corrupted index information:

      # Streamlined index info for UI display
      corruption_info = {
        'index_name' => 'index_projects_on_name',           # Essential for identification
        'table_name' => 'projects',                         # Context for the index
        'affected_columns' => 'column1, column2',           # Columns with potential issues
        'index_type' => 'btree',                            # Only btree indexes are affected
        'is_unique' => true,                                # Critical for prioritization
        'size_bytes' => 1234567,                            # Helps assess rebuild impact
        'corruption_types' => ['structural', 'duplicates'], # Specific issues found
        'needs_deduplication' => true                       # Action guidance
      }
    • Show collation mismatch information in results:

      
      # Database-level results including collation mismatches
      results = {
        'main' => {
          'collation_mismatches' => [
            {
              'collation_name' => 'en_US.UTF-8',
              'provider' => 'c',
              'stored_version' => '2.28',
              'actual_version' => '2.31'
            }
          ],
          'corrupted_indexes' => [
            {
              'index_name' => 'index_projects_on_name',
              'table_name' => 'projects',
              'affected_columns' => 'name',
              'index_type' => 'btree',
              'is_unique' => true,
              'size_bytes' => 1234567,
              'corruption_types' => ['structural'],
              'needs_deduplication' => false
            },
            {
              'index_name' => 'index_users_on_name',
              'table_name' => 'users',
              'affected_columns' => 'name',
              'index_type' => 'btree',
              'is_unique' => true,
              'size_bytes' => 5678901,
              'corruption_types' => ['structural', 'duplicates'],
              'needs_deduplication' => true
            }
          ]
        },
        'ci' => {
          'collation_mismatches' => [],
          'corrupted_indexes' => [
            {
              'index_name' => 'index_ci_builds_on_name',
              'table_name' => 'ci_builds',
              'affected_columns' => 'name',
              'index_type' => 'btree',
              'is_unique' => false,
              'size_bytes' => 9876543,
              'corruption_types' => ['structural'],
              'needs_deduplication' => false
            }
          ]
        }
      }
  2. Admin UI Integration

    • Add Database Diagnostics menu item to admin sidebar under Monitoring menu

    • Run Database Diagnostics will enqueue a background job to create generate CollationChecker results and store in Redis with 1-week or day expiration (TBD on the TTL)

    • Show collation mismatch information when present (with version differences)

    • Display index corruption details and remediation steps by database

      • Only link to documentation or contact support as remedy requires data modification
    • If no result is fetched on index action, simply show Run Diagnostics button only

    • If result is fetched on index action, show the results along with timestamp when the results was generated.

      screenshot for menu location

      Screenshot 2025-07-16 at 12.04.14.png

  3. Storage Considerations

    • Use Redis for storing results temporarily
    • We are only spot checking curated list of indexes, starting with 2 indexes first
    • JSON size estimate with streamlined data:
      • There are currently 900 indexes related with text column and b-tree
      • Not all of them will be corrupted, but if we take 900 as maximum
      • Based on Napkin Math by Claude: ~300 bytes per index × 900 indexes × 2 databases = ~540 KB (worst case)
      • Add ~100 bytes per collation mismatch information hash (negligible increase)
      • An educated guess would be around 10s indexes corruption,~~ so 3-5 KB~~
    • Set 1-week expiration for automatic cleanup
  4. Performance Considerations

    • System catalog queries are generally fast
    • Add index size information from pg_relation_size()
    • LIMIT 1 for duplicate detection allows early exit
    • Execute in background job to avoid blocking UI
    • Estimated runtime: 2-5 minutes for large instances

Acceptance Criteria

  • Enhanced CollationChecker implementation with multi-tiered detection
    • Background worker for running checks asynchronously
    • Admin UI menu item for Database Diagnostics
    • Results display showing:
      • Collation mismatches with version information when present
      • Corruption by database
      • Type of corruption (structural/duplicates)
      • Specific remediation steps
    • Tests for new functionality
    • Documentation for administrators on interpreting results

Implementation Notes

  1. The enhanced CollationChecker returns results as a nested hash with database names as keys
  2. Results are stored in Redis with key "gitlab:database:collation_check:latest" with expires_on
  3. Only B-tree indexes with text columns are checked (the only ones affected)
  4. Queries for duplicate check should be fast as they target system tables and use LIMIT 1 where possible
  5. Also show Collation mismatches information which can be valuable context even when not definitive indicators of corruption
    • We will loop through all B-tree indexes for text column, see point 2

References

Edited by Bishwa Hang Rai