Skip to content

Use Backup::DatabaseConnection for collation_check

What does this MR do and why?

This change updates a database maintenance tool that checks for collation mismatch and unique index corruption in GitLab's databases. The main improvements are:

  1. Database restriction: The tool now only runs on specific allowed databases ("main" and "ci") instead of all databases, making it more focused and safer.
  2. Configurable table size limits: Users can now set a custom maximum table size limit through an environment variable MAX_TABLE_SIZE), giving more control over which tables get checked based on their size.
  3. Better connection handling: The tool now uses a more robust database connection method specifically designed for backup operations, which helps with bypassing PgBouncer, if needed, by setting ENV variables.
  4. Updated architecture: Instead of using a simple class method call, the tool now creates individual checker instances for each database with specific parameters, allowing for better customization and control.

The tests were also updated to verify these new behaviors work correctly in both single and multiple database setups.

References

How to set up and validate locally

  1. Checkout this MR

  2. Run the task: bin/rails gitlab:db:collation_checker:main, it should run without error

  3. Run the task again with correct ENV variable set:

    $ GITLAB_BACKUP_PGUSER=bishwa bin/rails gitlab:db:collation_checker:main
    I, [2025-08-21T18:09:26.787652 #90125]  INFO -- : Checking for PostgreSQL collation mismatches on main database...
    I, [2025-08-21T18:09:26.792673 #90125]  INFO -- : No collation version mismatches detected on main.
    I, [2025-08-21T18:09:26.792692 #90125]  INFO -- : Found 8 indexes to corruption spot check.
    I, [2025-08-21T18:09:26.839257 #90125]  INFO -- : No corrupted indexes detected.

To test that it respects the ENV variable, set the wrong value and assert that it fails

$ GITLAB_BACKUP_PGUSER=foobar bin/rails gitlab:db:collation_checker:main
I, [2025-08-21T18:10:19.240188 #90201]  INFO -- : Checking for PostgreSQL collation mismatches on main database...
bin/rails aborted!
ActiveRecord::DatabaseConnectionError: There is an issue connecting to your database with your username/password, username: foobar.

Regression Test

  • Configure gdk for single database mode

    gdk config set gitlab.rails.databases.ci.enabled false
    gdk config set gitlab.rails.databases.sec.enabled false
    gdk reconfigure
  • Now run the specs and the tasks

    bundle exec rspec spec/tasks/gitlab/db_rake_spec.rb:635
    
    bin/rails gitlab:db:collation_checker
  • Revert the gdk config

    gdk config set gitlab.rails.databases.ci.enabled true
    gdk config set gitlab.rails.databases.sec.enabled true
    gdk reconfigure

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Bishwa Hang Rai

Merge request reports

Loading