Skip to content

Enhance collation checker

What does this MR do and why?

Enhance collation checker

This MR enhances collation checker to with few indexes to spot check based on the issue: #505982. We diverted from the initial approach to check all the existing B-Tree indexes with text column because the query would be very expensive for large tables. Now, we start with smaller known sets of index.

The result API from CollationChecker has been updated to be used in Admin Monitoring page later (Draft MR: !198009 (merged))

We have also added a worker class that will be called when Admin runs collation check, to generate result async, for Admin Monitoring page.

References

Database

  1. Fetching Index Info: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41641/commands/127849
  2. bt_index_check
    1. index_merge_request_diff_commit_users_on_name_and_email
      1. https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41702/commands/128029
    2. index_merge_request_diff_commit_users_on_org_id_name_email
      1. https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41702/commands/128030
  3. Finding Duplicates:
    1. index_merge_request_diff_commit_users_on_name_and_email
      1. cold: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41641/commands/127879
      2. warm: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41641/commands/127899
    2. index_merge_request_diff_commit_users_on_org_id_name_email
      1. Cold: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41641/commands/127894
      2. Warm: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41641/commands/127895

How to set up and validate locally

  1. Checkout this MR branch and run gdk rails c

  2. Run the Database::CollationChecker or the Database::CollationCheckerWorker

    pry(main)> Database::CollationCheckerWorker.new.perform
    => "OK"
    pry(main)> Gitlab::Redis::SharedState.with {|r| r.get(Database::CollationCheckerWorker::COLLATION_CHECK_CACHE_KEY)}
    => "{\"metadata\":{\"last_run_at\":\"2025-07-18T08:35:08Z\"},\"databases\":{\"main\":{\"collation_mismatches\":[],\"corrupted_indexes\":[]}}}"
    pry(main)> Gitlab::Json.parse(_)
    {"metadata"=>{"last_run_at"=>"2025-07-18T08:35:08Z"}, "databases"=>{"main"=>{"collation_mismatches"=>[], "corrupted_indexes"=>[]}}}
    => {"metadata"=>{"last_run_at"=>"2025-07-18T08:35:08Z"}, "databases"=>{"main"=>{"collation_mismatches"=>[], "corrupted_indexes"=>[]}}}

To create a corruption for shake of testing:

  1. Create setup with test tables and fetch information on database and index OID

    $ gdk psql
    
    CREATE TABLE test_corruption_table (
      id serial PRIMARY KEY,
      text_col text NOT NULL
    );
    
    CREATE INDEX test_corruption_idx ON test_corruption_table (text_col);
    
    INSERT INTO test_corruption_table (text_col) 
    SELECT 'value' || i FROM generate_series(1, 100) i;
    
    -- Get the database OID
    SELECT oid, datname FROM pg_database 
    WHERE datname = current_database();
    
      oid  |       datname
    -------+----------------------
     16580 | gitlabhq_development
    (1 row)
    
    -- Get the index file information
    SELECT c.oid, c.relname, c.relkind, c.relfilenode 
    FROM pg_class c 
    WHERE c.relname = 'test_corruption_idx';
    
       oid   |       relname       | relkind | relfilenode
    ---------+---------------------+---------+-------------
     2035130 | test_corruption_idx | i       |     2035130
    (1 row)
    
    SHOW data_directory;
                               data_directory
    ---------------------------------------------------------------------
     ~/gitlab-development-kit/postgresql/data
    (1 row)
  2. Stop postgres: gdk stop postgresql

  3. Corrupt the index file manually

    cd ~/gitlab-development-kit/postgresql/data/base/16580 # Datbase OID
    # Corrupt the index file
    # This replaces the first 128 bytes (critical header) with random data
    dd if=/dev/urandom of=2035130 bs=8 count=16 seek=0 conv=notrunc
  4. gdk start postgresql and gdk psql

    -- This should now fail with an error like:
    -- ERROR: invalid page in block 0 of relation base/16580/2035130
    SELECT bt_index_check('test_corruption_idx'::regclass);
  5. Test the CollationChecker returns corrupted index with this diff to add corrupted index

    diff --git a/lib/gitlab/database/collation_checker.rb b/lib/gitlab/database/collation_checker.rb
    index 11d017c5a227..8ea58f8cbcd6 100644
    --- a/lib/gitlab/database/collation_checker.rb
    +++ b/lib/gitlab/database/collation_checker.rb
    @@ -26,7 +26,8 @@ class CollationChecker
               'merge_request_diff_commit_users' => %w[
                 index_merge_request_diff_commit_users_on_name_and_email
                 index_merge_request_diff_commit_users_on_org_id_name_email
    -          ]
    +          ],
    +          'test_corruption_table' => %w[test_corruption_idx]
             }
           }.freeze
    pry(main)> Database::CollationCheckerWorker.new.perform
    => "OK"
    pry(main)> Gitlab::Redis::SharedState.with {|r| r.get(Database::CollationCheckerWorker::COLLATION_CHECK_CACHE_KEY)}
    => "{\"metadata\":{\"last_run_at\":\"2025-07-18T08:27:39Z\"},\"databases\":{\"main\":{\"collation_mismatches\":[],\"corrupted_indexes\":[{\"table_name\":\"test_corruption_table\",\"index_name\":\"test_corruption_idx\",\"corruption_types\":[\"structural\"]}]}}}"
    pry(main)> Gitlab::Json.parse(_)
    => {"metadata"=>{"last_run_at"=>"2025-07-18T08:27:39Z"}, "databases"=>{"main"=>{"collation_mismatches"=>[], "corrupted_indexes"=>[{"table_name"=>"test_corruption_table", "index_name"=>"test_corruption_idx", "corruption_types"=>["structural"]}]}}}
    pry(main)>
  6. Clean up

    DROP TABLE test_corruption_table CASCADE;

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Bishwa Hang Rai

Merge request reports

Loading