Use Backup::DatabaseConnection for collation_check
What does this MR do and why?
This change updates a database maintenance tool that checks for collation mismatch and unique index corruption in GitLab's databases. The main improvements are:
- Database restriction: The tool now only runs on specific allowed databases ("main" and "ci") instead of all databases, making it more focused and safer.
-
Configurable table size limits: Users can now set a custom maximum table size limit through an environment variable
MAX_TABLE_SIZE
), giving more control over which tables get checked based on their size. - Better connection handling: The tool now uses a more robust database connection method specifically designed for backup operations, which helps with bypassing PgBouncer, if needed, by setting ENV variables.
- Updated architecture: Instead of using a simple class method call, the tool now creates individual checker instances for each database with specific parameters, allowing for better customization and control.
The tests were also updated to verify these new behaviors work correctly in both single and multiple database setups.
References
- Previous MR: Use Backup::DatabaseConnection if configured (!202095 - merged)
- Related to Provide a way to bypass PgBouncer for db mainte... (#562640 - closed)
How to set up and validate locally
-
Checkout this MR
-
Run the task:
bin/rails gitlab:db:collation_checker:main
, it should run without error -
Run the task again with correct ENV variable set:
$ GITLAB_BACKUP_PGUSER=bishwa bin/rails gitlab:db:collation_checker:main I, [2025-08-21T18:09:26.787652 #90125] INFO -- : Checking for PostgreSQL collation mismatches on main database... I, [2025-08-21T18:09:26.792673 #90125] INFO -- : No collation version mismatches detected on main. I, [2025-08-21T18:09:26.792692 #90125] INFO -- : Found 8 indexes to corruption spot check. I, [2025-08-21T18:09:26.839257 #90125] INFO -- : No corrupted indexes detected.
To test that it respects the ENV variable, set the wrong value and assert that it fails
$ GITLAB_BACKUP_PGUSER=foobar bin/rails gitlab:db:collation_checker:main
I, [2025-08-21T18:10:19.240188 #90201] INFO -- : Checking for PostgreSQL collation mismatches on main database...
bin/rails aborted!
ActiveRecord::DatabaseConnectionError: There is an issue connecting to your database with your username/password, username: foobar.
Regression Test
-
Configure
gdk
for single database modegdk config set gitlab.rails.databases.ci.enabled false gdk config set gitlab.rails.databases.sec.enabled false gdk reconfigure
-
Now run the specs and the tasks
bundle exec rspec spec/tasks/gitlab/db_rake_spec.rb:635 bin/rails gitlab:db:collation_checker
-
Revert the gdk config
gdk config set gitlab.rails.databases.ci.enabled true gdk config set gitlab.rails.databases.sec.enabled true gdk reconfigure
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.