feat: Add schema version check script for ClickHouse DB (Attempt 2)
What does this MR do?
This is the second attempt at adding a check to confirm that ClickHouse migrations have been executed before starting the GitLab Rails container. The first attempt was the MR !2624 (merged) which had to be reverted in !2636 (merged) because when the change was deployed to gstg-cny without a valid click_house.yml configuration file, the wait-for-deps script failed causing deployment to fail. This caused the incident gitlab-com/gl-infra/production#20478 (closed).
This MR contains two commits. The first commit (1124f95e) contains changes from the original MR. I have cherry-picked it back. The second commit (fcb49379) contains the bug fix which will ensure that the check will not fail when ClickHouse is not even enabled for a GitLab installation.
feat: Add schema version check script for ClickHouse DB
This MR adds a check which confirms whether all ClickHouse migrations have been executed to the
gitlab-rails/scripts/wait-for-deps script. The wait-for-deps script is run by the dependencies
init container of the GitLab Helm chart installation. When this script fails, start-up is blocked
for webservice and sidekiq pods' main containers (the containers in which the GitLab Rails
codebase runs). wait-for-deps already checks that the mandatory dependencies of GitLab, Redis and
PostgreSQL, are available, and that all regular migrations have been executed for PostgreSQL.
This check is being introduced in the "default to off" state. So, it will not fail when a user
upgrades their gitlab-rails image to a version containing this commit. This default will be
maintained until the required stop release 18.5. The default will then be updated to use the value
of BYPASS_SCHEMA_VERSION starting in release 18.6. Users may set
BYPASS_CLICKHOUSE_SCHEMA_VERSION=false if they want to enable this check immediately.
In gitlab-org/charts/gitlab!4458 (merged), we are going to enable migrations for ClickHouse to run in Helm within the GitLab-Migrations chart. Before merging that change, we want a mechanism to confirm that ClickHouse migrations are executed before starting up the Rails container.
The schema version check for ClickHouse follows the same logic as lib/checks/postgresql.rb. It
introduces a new environment variable that can be used by users to disable the version check for
ClickHouse, while keeping it enabled for Postgres.
-
BYPASS_CLICKHOUSE_SCHEMA_VERSION: If set to anything exceptfalseor0, the check will pass even if some regular migrations have not been executed in ClickHouse. If the DB has not been created or no migrations has been run, the check will fail even if this environment variable is set totrue. -
BYPASS_POST_DEPLOYMENT=true: If set, the check will pass as long as regular migrations are executed. ClickHouse has support for post-deployment migrations, though there are no PDMs as of now in the GitLab codebase.
I tested this locally by running the dependencies (Redis, PostgreSQL, and ClickHouse) using Docker
Compose, and running a GitLab Rails container within the same network and executing the
wait-for-deps script in various scenarios.
Testing
Please see the Testing section from the original MR: !2624 (merged)
In addition to that, I tested the following cases (assuming a Rails installation that starts with only Postgres and then has ClickHouse enabled):
Start Rails without ClickHouse
# Confirm that the check does not look at ClickHouse
$ docker run --network gitlab-rails_gitlab \
-it --rm \
-v ./scripts:/scripts-new \
-v ./config/database.yml:/srv/gitlab/config/database.yml \
-v ./config/resque.yml:/srv/gitlab/config/resque.yml \
-w /scripts-new \
--name webservice-testing \
registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ee:master -- bash
Begin parsing .erb templates from /srv/gitlab/config
Begin parsing .tpl templates from /srv/gitlab/config
root@5d8e0b2c6498:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: ClickHouse is not configured. Skipping migration checks.
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
Checking: main
Error checking main: We could not find your database: gitlabhq_production. Available database configurations can be found in config/database.yml.
To resolve this error:
- Did you not create the database, or did you delete it? To create the database, run:
bin/rails db:create
- Has the database name changed? Verify that config/database.yml contains the correct database name.
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@5d8e0b2c6498:/scripts-new# echo $?
1
Setup PostgreSQL and confirm that check passes
root@5d8e0b2c6498:/scripts-new# /srv/gitlab/bin/rake -f /srv/gitlab/Rakefile db:setup
Missing Rails.application.credentials.secret_key_base for production environment. The secret will be generated and stored in config/secrets.yml.
Missing Rails.application.credentials.otp_key_base for production environment. The secret will be generated and stored in config/secrets.yml.
Missing Rails.application.credentials.db_key_base for production environment. The secret will be generated and stored in config/secrets.yml.
Missing Rails.application.credentials.openid_connect_signing_key for production environment. The secret will be generated and stored in config/secrets.yml.
Missing Rails.application.credentials.active_record_encryption_primary_key for production environment. The secret will be generated and stored in config/secrets.yml.
Missing Rails.application.credentials.active_record_encryption_deterministic_key for production environment. The secret will be generated and stored in config/secrets.yml.
Missing Rails.application.credentials.active_record_encryption_key_derivation_salt for production environment. The secret will be generated and stored in config/secrets.yml.
Creating a backup of secrets file /srv/gitlab/config/secrets.yml at /srv/gitlab/tmp/backups/secrets.yml.orig.1756976406
Created database 'gitlabhq_production'
root@5d8e0b2c6498:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: ClickHouse is not configured. Skipping migration checks.
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
Checking: main
Database Schema - main (gitlabhq_production)
root@5d8e0b2c6498:/scripts-new# echo $?
0
Start Rails with ClickHouse enabled and confirm that check fails
$ docker run --network gitlab-rails_gitlab \
-it --rm \
-v ./scripts:/scripts-new \
-v ./config/database.yml:/srv/gitlab/config/database.yml \
-v ./config/resque.yml:/srv/gitlab/config/resque.yml \
-v ./config/click_house.yml:/srv/gitlab/config/click_house.yml \
-w /scripts-new \
--name webservice-testing \
registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ee:master -- bash
Begin parsing .erb templates from /srv/gitlab/config
Begin parsing .tpl templates from /srv/gitlab/config
root@6cc9401ddae7:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-09-04T09:02:02.969576 #9] INFO -- : SELECT version FROM schema_migrations
[ClickHouse] FATAL: Error while fetching the database versions for ClickHouse main DB: Code: 60. DB::Exception: Unknown table expression identifier 'schema_migrations' in scope SELECT version FROM schema_migrations. (UNKNOWN_TABLE) (version 25.7.4.11 (official build))
[ClickHouse] NOTICE: Database has not been initialized yet.
[ClickHouse] INFO: There are 131 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@6cc9401ddae7:/scripts-new# echo $?
1
Run ClickHouse migrations and then confirm that the check passes
root@6cc9401ddae7:/scripts-new# /scripts/db-migrate
[snip]
root@6cc9401ddae7:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-09-04T09:05:30.516605 #1137] INFO -- : SELECT version FROM schema_migrations
[ClickHouse] INFO: There are 0 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Database Schema - main (gitlabhq_production)
root@6cc9401ddae7:/scripts-new# echo $?
0
Related issues
gitlab-com/gl-infra/delivery#21436 (closed) and gitlab-org/charts/gitlab!4458 (merged).
Checklist
See Definition of done.
For anything in this list which will not be completed, please provide a reason in the MR discussion
Required
-
Merge Request Title, and Description are up to date, accurate, and descriptive -
MR targeting the appropriate branch -
MR has a green pipeline on GitLab.com -
When ready for review, MR is labeled "~workflow::ready for review" per the Distribution MR workflow
Expected (please provide an explanation if not completing)
-
Test plan indicating conditions for success has been posted and passes -
Documentation created/updated -
Integration tests added to GitLab QA -
The impact any change in container size has should be evaluated -
New dependencies are managed with GitLab forked renovatebot