feat: Add schema version check script for ClickHouse DB
What does this MR do?
feat: Add schema version check script for ClickHouse DB
feat: Add schema version check script for ClickHouse DB
Related to gitlab-com/gl-infra/delivery#21436 (closed) and gitlab-org/charts/gitlab!4458 (merged).
This MR adds a check which confirms whether all ClickHouse migrations have been executed to the
gitlab-rails/scripts/wait-for-deps script. The wait-for-deps script is run by the dependencies
init container of the GitLab Helm chart installation. When this script fails, start-up is blocked
for webservice and sidekiq pods' main containers (the containers in which the GitLab Rails
codebase runs). wait-for-deps already checks that the mandatory dependencies of GitLab, Redis and
PostgreSQL, are available, and that all regular migrations have been executed for PostgreSQL.
This check is being introduced in the "default to off" state. So, it will not fail when a user
upgrades their gitlab-rails image to a version containing this commit. This default will be
maintained until the required stop release %18.5. The default will then be updated to use the value
of BYPASS_SCHEMA_VERSION starting in %18.6. Users may set
BYPASS_CLICKHOUSE_SCHEMA_VERSION=false if they want to enable this check immediately.
In gitlab-org/charts/gitlab!4458 (merged), we are going to enable migrations for ClickHouse to run in Helm within the GitLab-Migrations chart. Before merging that change, we want a mechanism to confirm that ClickHouse migrations are executed before starting up the Rails container.
The schema version check for ClickHouse follows the same logic as lib/checks/postgresql.rb. It
introduces a new environment variable that can be used by users to disable the version check for
ClickHouse, while keeping it enabled for Postgres.
-
BYPASS_CLICKHOUSE_SCHEMA_VERSION: If set to anything exceptfalseor0, the check will pass even if some regular migrations have not been executed in ClickHouse. If the DB has not been created or no migrations has been run, the check will fail even if this environment variable is set totrue. -
BYPASS_POST_DEPLOYMENT=true: If set, the check will pass as long as regular migrations are executed. ClickHouse has support for post-deployment migrations, though there are no PDMs as of now in the GitLab codebase.
I tested this locally by running the dependencies (Redis, PostgreSQL, and ClickHouse) using Docker
Compose, and running a GitLab Rails container within the same network and executing the
wait-for-deps script in various scenarios.
Related issues
gitlab-com/gl-infra/delivery#21436 (closed)
Testing
I tested this locally by running the dependencies using Docker Compose:
docker-compose.yml
services:
redis-service:
image: "redis:alpine"
networks:
- gitlab
postgres-db:
image: postgres
environment:
POSTGRES_PASSWORD: example
POSTGRES_USER: user
# Don't specify DB here. GitLab will create its database by itself using Rake tasks.
# POSTGRES_DB: gitlabhq_production
networks:
- gitlab
clickhouse-db:
image: clickhouse
environment:
CLICKHOUSE_PASSWORD: example
CLICKHOUSE_USER: user
CLICKHOUSE_DB: gitlab_clickhouse_production
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
networks:
- gitlab
networks:
gitlab:
I wrote the appropriate YAML configurations for these as well:
YAML configurations for gitlab-rails
$ for i in config/*.yml; do echo "# $i"; cat $i; echo "# ---"; done
# config/click_house.yml
production:
main:
database: gitlab_clickhouse_production
url: "http://clickhouse-db:8123"
username: user
password: example
variables:
enable_http_compression: 1
date_time_input_format: basic # needed for CH cloud
# ---
# config/database.yml
production:
main:
adapter: postgresql
encoding: unicode
database: gitlabhq_production
username: user
password: "example"
host: postgres-db
ci:
adapter: postgresql
encoding: unicode
database: gitlabhq_production
username: user
password: "example"
host: postgres-db
database_tasks: false
# ---
# config/resque.yml
production:
url: redis://redis-service:6379/0
# ---
With these in place, start all the dependencies using Docker Compose:
docker compose -f docker-compose.yml up
Then, start a GitLab Rails container using this MR's image in the network that was created by Docker compose:
$ docker run --network gitlab-rails_gitlab \
-it --rm \
-v ./config/database.yml:/srv/gitlab/config/database.yml \
-v ./config/resque.yml:/srv/gitlab/config/resque.yml \
-v ./config/click_house.yml:/srv/gitlab/config/click_house.yml \
-w /scripts \
--name webservice-testing \
registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ee:sk-development-clickhouse-migration-check -- bash
This container can now use all the dependency containers. So, I tested the wait-for-deps script in various situations one-by-one.
1. Database Not Setup
Neither the Postgres DB nor the ClickHouse DB exist. So, both the checks fail. The check for Redis succeeds because it does not have migrations.
Output
$ docker run --network gitlab-rails_gitlab \
-it --rm \
-v ./config/database.yml:/srv/gitlab/config/database.yml \
-v ./config/resque.yml:/srv/gitlab/config/resque.yml \
-v ./config/click_house.yml:/srv/gitlab/config/click_house.yml \
-v ./scripts:/scripts-overlay \
-w /scripts \
--name webservice-testing \
registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ee:sk-check-clickhouse-migration-state -- bash
Begin parsing .erb templates from /srv/gitlab/config
Begin parsing .tpl templates from /srv/gitlab/config
root@a21c67e5850d:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-30T03:02:12.967336 #83] INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] FATAL: Error while fetching the database versions for ClickHouse main DB: Code: 60. DB::Exception: Unknown table expression identifier 'schema_migrations' in scope SELECT version FROM schema_migrations. (UNKNOWN_TABLE) (version 25.7.4.11 (official build))
[ClickHouse] NOTICE: Database has not been initialized yet.
[ClickHouse] INFO: There are 126 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Error checking main: We could not find your database: gitlabhq_production. Available database configurations can be found in config/database.yml.
To resolve this error:
- Did you not create the database, or did you delete it? To create the database, run:
bin/rails db:create
- Has the database name changed? Verify that config/database.yml contains the correct database name.
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@a21c67e5850d:/scripts-new# echo $?
1
All the commands are being run inside a similar container (in sequence)
2. Database Setup
Postgres and ClickHouse DBs can be setup using a couple of Rake task:
root@a21c67e5850d:/scripts-new# /srv/gitlab/bin/rake -f /srv/gitlab/Rakefile db:setup
Missing Rails.application.credentials.secret_key_base for production environment. The secret will be generated and stored in config/secrets.yml.
[snip]
Creating a backup of secrets file /srv/gitlab/config/secrets.yml at /srv/gitlab/tmp/backups/secrets.yml.orig.1756523038
Created database 'gitlabhq_production'
root@a21c67e5850d:/scripts-new# /srv/gitlab/bin/rake -f /srv/gitlab/Rakefile gitlab:clickhouse:setup
Running gitlab:clickhouse:setup:main rake task
root@a21c67e5850d:/scripts-new#
This is not enough for the check to pass; even with BYPASS_SCHEMA_VERSION, because the DBs exist but they have not been initialized:
Output
root@a21c67e5850d:/scripts-new# BYPASS_SCHEMA_VERSION=true WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-30T03:05:35.777151 #245] INFO -- : SELECT version FROM schema_migrations
[ClickHouse] NOTICE: Database has not been initialized yet.
[ClickHouse] INFO: There are 126 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: schema version check bypassed by BYPASS_SCHEMA_VERSION='true'
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@a21c67e5850d:/scripts-new# echo $?
1
As seen here, we start seeing that ClickHouse now has 126 pending migrations.
3. Database Setup With Pending Migrations for Postgres and ClickHouse
I ran all the regular migrations for Postgres and then rolled back 3 migrations.
For ClickHouse, I tried gitlab:clickhouse:migrate VERSION=... but that kept giving me an error. So, I moved some migrations out of the directory, and then ran all the migrations, and then moved the migrations back.
# Move Clickhouse migrations out
root@75337c7ebbb6:/scripts# mv /srv/gitlab/db/click_house/migrate/main/{20250808063619_create_hierarchy_audit_events.rb,20250808064130_create_hierarchy_audit_events_mv.rb} /srv/gitlab/db/click_house/schema_migrations/main/{20250808063619,20250808064130} /tmp
# ... run all the migrations ...
root@75337c7ebbb6:/scripts# /scripts/db-migrate
# ... move the ClickHouse migrations back.
root@75337c7ebbb6:/scripts# cp -v /tmp/{20250808063619_create_hierarchy_audit_events.rb,20250808064130_create_hierarchy_audit_events_mv.rb} /srv/gitlab/db/click_house/migrate/main/; cp -v /tmp/{20250808063619,20250808064130} /srv/gitlab/db/click_house/schema_migrations/main/
'/tmp/20250808063619_create_hierarchy_audit_events.rb' -> '/srv/gitlab/db/click_house/migrate/main/20250808063619_create_hierarchy_audit_events.rb'
'/tmp/20250808064130_create_hierarchy_audit_events_mv.rb' -> '/srv/gitlab/db/click_house/migrate/main/20250808064130_create_hierarchy_audit_events_mv.rb'
'/tmp/20250808063619' -> '/srv/gitlab/db/click_house/schema_migrations/main/20250808063619'
'/tmp/20250808064130' -> '/srv/gitlab/db/click_house/schema_migrations/main/20250808064130'
# ... Rollback Postgres by 3 steps
root@75337c7ebbb6:/scripts# /srv/gitlab/bin/rake -f /srv/gitlab/Rakefile db:rollback STEP=3
At this point, we have 3 pending Postgres migrations and 2 pending ClickHouse migrations.
3.1 Without BYPASS_SCHEMA_VERSION=true
As expected, the script fails and prints the appropriate pending migrations:
Output
root@75337c7ebbb6:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-21T13:31:36.282891 #2028] INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
NOTICE: There are 3 pending migrations.
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@75337c7ebbb6:/scripts# echo $?
1
3.2 With BYPASS_SCHEMA_VERSION=true
In this case, the script will ignore the pending migrations and pass.
Output
root@75337c7ebbb6:/scripts# WAIT_FOR_TIMEOUT=2 BYPASS_SCHEMA_VERSION=true CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-21T13:32:07.424017 #2046] INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
[ClickHouse] WARNING: schema version check bypassed by BYPASS_SCHEMA_VERSION='true'
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: schema version check bypassed by BYPASS_SCHEMA_VERSION='true'
NOTICE: There are 3 pending migrations.
root@75337c7ebbb6:/scripts# echo $?
0
4. Database Setup With Pending Migrations for ClickHouse Only
As an extension to the above case, we assume a case where all the migrations have been executed on Postgres but some are still pending in ClickHouse.
4.1 Default Case
By default, the script will pass and will not raise an error. This is to provide users with a smooth upgrade process.
If a ClickHouse-enabled GitLab user uses a CNG image with this check and either does not use the GitLab-Migrations chart or uses a version of the GitLab-Migrations chart without !2624 (merged), then their Sidekiq and Webservice pods will not start-up as it is possible that ClickHouse migrations were not run automatically for them. We intend to update the default to the value of the BYPASS_SCHEMA_VERSION variable after the required stop of %18.5. We discussed this plan in a comment thread of this MR.
Output
root@bcb872d996e7:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-27T06:07:55.829727 #1739] INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Database Schema - main (gitlabhq_production)
root@bcb872d996e7:/scripts-new# echo $?
0
4.2 Explicit Bypass Disabled Case BYPASS_CLICKHOUSE_SCHEMA_VERSION='false'
If the user has explicitly enabled the check, then the script will fail.
Output
root@bcb872d996e7:/scripts-new# BYPASS_CLICKHOUSE_SCHEMA_VERSION='false' WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-27T06:07:46.894430 #1721] INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@bcb872d996e7:/scripts-new# echo $?
1
5. Database Setup With All Migrations Executed
In this case, the script always passes.
root@75337c7ebbb6:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-21T13:36:51.523837 #2252] INFO -- : SELECT version FROM schema_migrations
[ClickHouse] INFO: There are 0 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
root@75337c7ebbb6:/scripts# echo $?
0
6. Testing Against HTTPS ClickHouse Cloud
I tested against ClickHouse Cloud, which is served over HTTPS (as all my local tests were over HTTP). Some migrations had already been run on the ClickHouse Cloud instance, so the script correctly showed the number of pending migrations:
Output
root@41dc8933bb0c:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-22T08:31:00.429463 #69] INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 5 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@41dc8933bb0c:/scripts# echo $?
1
After running migrations, the number went down to 0:
root@41dc8933bb0c:/scripts# /scripts/db-migrate
[snip]
Running gitlab:clickhouse:migrate:main rake task
== 20250813093423 CreateHierarchyMrTable: migrating ===========================
== 20250813093423 CreateHierarchyMrTable: migrated (0.3786s) ==================
== 20250813094645 CreateHierarchyMrMv: migrating ==============================
== 20250813094645 CreateHierarchyMrMv: migrated (0.2778s) =====================
== 20250813102642 AddTraversalIdsToSiphonIssues: migrating ====================
== 20250813102642 AddTraversalIdsToSiphonIssues: migrated (0.0908s) ===========
== 20250818064118 RecreateChatAiTrackingMv: migrating =========================
== 20250818064118 RecreateChatAiTrackingMv: migrated (0.0744s) ================
== 20250818064314 RecreateCodeSuggestionsAiTrackingMv: migrating ==============
== 20250818064314 RecreateCodeSuggestionsAiTrackingMv: migrated (0.1534s) =====
[snip]
root@41dc8933bb0c:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-22T08:37:01.581399 #326] INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 0 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
7. Error Case (ClickHouse DB Not Available)
Stop the ClickHouse container running locally:
$ docker container stop gitlab-rails-clickhouse-db-1
gitlab-rails-clickhouse-db-1
The check should print the appropriate error for ClickHouse and fail.
root@a21c67e5850d:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-30T03:08:03.170085 #281] INFO -- : SELECT version FROM schema_migrations
[ClickHouse] FATAL: Error while checking schema versions for ClickHouse main DB: Failed to open TCP connection to clickhouse-db:8123 (getaddrinfo: Temporary failure in name resolution)
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@a21c67e5850d:/scripts-new# echo $?
1
Checklist
See Definition of done.
For anything in this list which will not be completed, please provide a reason in the MR discussion
Required
-
Merge Request Title, and Description are up to date, accurate, and descriptive -
MR targeting the appropriate branch -
MR has a green pipeline on GitLab.com -
When ready for review, MR is labeled "~workflow::ready for review" per the Distribution MR workflow
Expected (please provide an explanation if not completing)
-
Test plan indicating conditions for success has been posted and passes -
Documentation created/updated -
This will be covered along with a follow-up of gitlab-org/charts/gitlab!4458 (merged).
-
-
Integration tests added to GitLab QA - Not required, I think
-
The impact any change in container size has should be evaluated - No impact
-
New dependencies are managed with GitLab forked renovatebot - No new dependencies