Enhance collation checker
What does this MR do and why?
Enhance collation checker
This MR enhances collation checker to with few indexes to spot check based on the issue: #505982. We diverted from the initial approach to check all the existing B-Tree
indexes with text
column because the query would be very expensive for large tables. Now, we start with smaller known sets of index.
The result API from CollationChecker has been updated to be used in Admin Monitoring page later (Draft MR: !198009 (merged))
We have also added a worker class that will be called when Admin runs collation check, to generate result async, for Admin Monitoring page.
References
- Related to Integrate Collation Checker with Admin UI for D... (#555916 - closed)
- Previous MR: Add gitlab:db:collation_checker rake task (!195450 - merged)
- Admin Page MR: Add Database Diagnostics monitoring page (!198009 - merged)
Database
- Fetching Index Info: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/41641/commands/127849
-
bt_index_check
- index_merge_request_diff_commit_users_on_name_and_email
- index_merge_request_diff_commit_users_on_org_id_name_email
- Finding Duplicates:
- index_merge_request_diff_commit_users_on_name_and_email
- index_merge_request_diff_commit_users_on_org_id_name_email
How to set up and validate locally
-
Checkout this MR branch and run
gdk rails c
-
Run the
Database::CollationChecker
or theDatabase::CollationCheckerWorker
pry(main)> Database::CollationCheckerWorker.new.perform => "OK" pry(main)> Gitlab::Redis::SharedState.with {|r| r.get(Database::CollationCheckerWorker::COLLATION_CHECK_CACHE_KEY)} => "{\"metadata\":{\"last_run_at\":\"2025-07-18T08:35:08Z\"},\"databases\":{\"main\":{\"collation_mismatches\":[],\"corrupted_indexes\":[]}}}" pry(main)> Gitlab::Json.parse(_) {"metadata"=>{"last_run_at"=>"2025-07-18T08:35:08Z"}, "databases"=>{"main"=>{"collation_mismatches"=>[], "corrupted_indexes"=>[]}}} => {"metadata"=>{"last_run_at"=>"2025-07-18T08:35:08Z"}, "databases"=>{"main"=>{"collation_mismatches"=>[], "corrupted_indexes"=>[]}}}
To create a corruption for shake of testing:
-
Create setup with test tables and fetch information on database and index
OID
$ gdk psql CREATE TABLE test_corruption_table ( id serial PRIMARY KEY, text_col text NOT NULL ); CREATE INDEX test_corruption_idx ON test_corruption_table (text_col); INSERT INTO test_corruption_table (text_col) SELECT 'value' || i FROM generate_series(1, 100) i; -- Get the database OID SELECT oid, datname FROM pg_database WHERE datname = current_database(); oid | datname -------+---------------------- 16580 | gitlabhq_development (1 row) -- Get the index file information SELECT c.oid, c.relname, c.relkind, c.relfilenode FROM pg_class c WHERE c.relname = 'test_corruption_idx'; oid | relname | relkind | relfilenode ---------+---------------------+---------+------------- 2035130 | test_corruption_idx | i | 2035130 (1 row) SHOW data_directory; data_directory --------------------------------------------------------------------- ~/gitlab-development-kit/postgresql/data (1 row)
-
Stop postgres:
gdk stop postgresql
-
Corrupt the index file manually
cd ~/gitlab-development-kit/postgresql/data/base/16580 # Datbase OID # Corrupt the index file # This replaces the first 128 bytes (critical header) with random data dd if=/dev/urandom of=2035130 bs=8 count=16 seek=0 conv=notrunc
-
gdk start postgresql
andgdk psql
-- This should now fail with an error like: -- ERROR: invalid page in block 0 of relation base/16580/2035130 SELECT bt_index_check('test_corruption_idx'::regclass);
-
Test the
CollationChecker
returns corrupted index with this diff to add corrupted indexdiff --git a/lib/gitlab/database/collation_checker.rb b/lib/gitlab/database/collation_checker.rb index 11d017c5a227..8ea58f8cbcd6 100644 --- a/lib/gitlab/database/collation_checker.rb +++ b/lib/gitlab/database/collation_checker.rb @@ -26,7 +26,8 @@ class CollationChecker 'merge_request_diff_commit_users' => %w[ index_merge_request_diff_commit_users_on_name_and_email index_merge_request_diff_commit_users_on_org_id_name_email - ] + ], + 'test_corruption_table' => %w[test_corruption_idx] } }.freeze
pry(main)> Database::CollationCheckerWorker.new.perform => "OK" pry(main)> Gitlab::Redis::SharedState.with {|r| r.get(Database::CollationCheckerWorker::COLLATION_CHECK_CACHE_KEY)} => "{\"metadata\":{\"last_run_at\":\"2025-07-18T08:27:39Z\"},\"databases\":{\"main\":{\"collation_mismatches\":[],\"corrupted_indexes\":[{\"table_name\":\"test_corruption_table\",\"index_name\":\"test_corruption_idx\",\"corruption_types\":[\"structural\"]}]}}}" pry(main)> Gitlab::Json.parse(_) => {"metadata"=>{"last_run_at"=>"2025-07-18T08:27:39Z"}, "databases"=>{"main"=>{"collation_mismatches"=>[], "corrupted_indexes"=>[{"table_name"=>"test_corruption_table", "index_name"=>"test_corruption_idx", "corruption_types"=>["structural"]}]}}} pry(main)>
-
Clean up
DROP TABLE test_corruption_table CASCADE;
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.