Skip to content

feat: Add schema version check script for ClickHouse DB

What does this MR do?

feat: Add schema version check script for ClickHouse DB

feat: Add schema version check script for ClickHouse DB

Related to gitlab-com/gl-infra/delivery#21436 (closed) and gitlab-org/charts/gitlab!4458 (merged).

This MR adds a check which confirms whether all ClickHouse migrations have been executed to the gitlab-rails/scripts/wait-for-deps script. The wait-for-deps script is run by the dependencies init container of the GitLab Helm chart installation. When this script fails, start-up is blocked for webservice and sidekiq pods' main containers (the containers in which the GitLab Rails codebase runs). wait-for-deps already checks that the mandatory dependencies of GitLab, Redis and PostgreSQL, are available, and that all regular migrations have been executed for PostgreSQL.

This check is being introduced in the "default to off" state. So, it will not fail when a user upgrades their gitlab-rails image to a version containing this commit. This default will be maintained until the required stop release %18.5. The default will then be updated to use the value of BYPASS_SCHEMA_VERSION starting in %18.6. Users may set BYPASS_CLICKHOUSE_SCHEMA_VERSION=false if they want to enable this check immediately.

In gitlab-org/charts/gitlab!4458 (merged), we are going to enable migrations for ClickHouse to run in Helm within the GitLab-Migrations chart. Before merging that change, we want a mechanism to confirm that ClickHouse migrations are executed before starting up the Rails container.

The schema version check for ClickHouse follows the same logic as lib/checks/postgresql.rb. It introduces a new environment variable that can be used by users to disable the version check for ClickHouse, while keeping it enabled for Postgres.

  1. BYPASS_CLICKHOUSE_SCHEMA_VERSION: If set to anything except false or 0, the check will pass even if some regular migrations have not been executed in ClickHouse. If the DB has not been created or no migrations has been run, the check will fail even if this environment variable is set to true.
  2. BYPASS_POST_DEPLOYMENT=true: If set, the check will pass as long as regular migrations are executed. ClickHouse has support for post-deployment migrations, though there are no PDMs as of now in the GitLab codebase.

I tested this locally by running the dependencies (Redis, PostgreSQL, and ClickHouse) using Docker Compose, and running a GitLab Rails container within the same network and executing the wait-for-deps script in various scenarios.

Related issues

gitlab-com/gl-infra/delivery#21436 (closed)

Testing

I tested this locally by running the dependencies using Docker Compose:

docker-compose.yml
services:
  redis-service:
    image: "redis:alpine"
    networks:
      - gitlab

  postgres-db:
    image: postgres
    environment:
      POSTGRES_PASSWORD: example
      POSTGRES_USER: user
      # Don't specify DB here. GitLab will create its database by itself using Rake tasks.
      # POSTGRES_DB: gitlabhq_production
    networks:
      - gitlab

  clickhouse-db:
    image: clickhouse
    environment:
      CLICKHOUSE_PASSWORD: example
      CLICKHOUSE_USER: user
      CLICKHOUSE_DB: gitlab_clickhouse_production
      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: 1
    networks:
      - gitlab

networks:
  gitlab:

I wrote the appropriate YAML configurations for these as well:

YAML configurations for gitlab-rails
$ for i in config/*.yml; do echo "# $i"; cat $i; echo "# ---"; done
# config/click_house.yml
production:
  main:
    database: gitlab_clickhouse_production
    url: "http://clickhouse-db:8123"
    username: user
    password: example
    variables:
      enable_http_compression: 1
      date_time_input_format: basic # needed for CH cloud
# ---
# config/database.yml
production:
  main:
    adapter: postgresql
    encoding: unicode
    database: gitlabhq_production
    username: user
    password: "example"
    host: postgres-db
  ci:
    adapter: postgresql
    encoding: unicode
    database: gitlabhq_production
    username: user
    password: "example"
    host: postgres-db
    database_tasks: false
# ---
# config/resque.yml
production:
  url: redis://redis-service:6379/0
# ---

With these in place, start all the dependencies using Docker Compose:

docker compose -f docker-compose.yml up

Then, start a GitLab Rails container using this MR's image in the network that was created by Docker compose:

$ docker run --network gitlab-rails_gitlab \
		   -it --rm \
		   -v ./config/database.yml:/srv/gitlab/config/database.yml \
		   -v ./config/resque.yml:/srv/gitlab/config/resque.yml \
		   -v ./config/click_house.yml:/srv/gitlab/config/click_house.yml \
		   -w /scripts \
		   --name webservice-testing \
		   registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ee:sk-development-clickhouse-migration-check -- bash

This container can now use all the dependency containers. So, I tested the wait-for-deps script in various situations one-by-one.

1. Database Not Setup

Neither the Postgres DB nor the ClickHouse DB exist. So, both the checks fail. The check for Redis succeeds because it does not have migrations.

Output
$ docker run --network gitlab-rails_gitlab \
                         -it --rm \
                         -v ./config/database.yml:/srv/gitlab/config/database.yml \
                         -v ./config/resque.yml:/srv/gitlab/config/resque.yml \
                         -v ./config/click_house.yml:/srv/gitlab/config/click_house.yml \
                         -v ./scripts:/scripts-overlay \
                         -w /scripts \
                         --name webservice-testing \
                         registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ee:sk-check-clickhouse-migration-state -- bash
Begin parsing .erb templates from /srv/gitlab/config
Begin parsing .tpl templates from /srv/gitlab/config
root@a21c67e5850d:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-30T03:02:12.967336 #83]  INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] FATAL: Error while fetching the database versions for ClickHouse main DB: Code: 60. DB::Exception: Unknown table expression identifier 'schema_migrations' in scope SELECT version FROM schema_migrations. (UNKNOWN_TABLE) (version 25.7.4.11 (official build))
[ClickHouse] NOTICE: Database has not been initialized yet.
[ClickHouse] INFO: There are 126 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Error checking main: We could not find your database: gitlabhq_production. Available database configurations can be found in config/database.yml.

To resolve this error:

- Did you not create the database, or did you delete it? To create the database, run:

    bin/rails db:create

- Has the database name changed? Verify that config/database.yml contains the correct database name.
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@a21c67e5850d:/scripts-new# echo $?
1

All the commands are being run inside a similar container (in sequence)

2. Database Setup

Postgres and ClickHouse DBs can be setup using a couple of Rake task:

root@a21c67e5850d:/scripts-new# /srv/gitlab/bin/rake -f /srv/gitlab/Rakefile db:setup
Missing Rails.application.credentials.secret_key_base for production environment. The secret will be generated and stored in config/secrets.yml.
[snip]
Creating a backup of secrets file /srv/gitlab/config/secrets.yml at /srv/gitlab/tmp/backups/secrets.yml.orig.1756523038
Created database 'gitlabhq_production'

root@a21c67e5850d:/scripts-new# /srv/gitlab/bin/rake -f /srv/gitlab/Rakefile gitlab:clickhouse:setup
Running gitlab:clickhouse:setup:main rake task
root@a21c67e5850d:/scripts-new#

This is not enough for the check to pass; even with BYPASS_SCHEMA_VERSION, because the DBs exist but they have not been initialized:

Output
root@a21c67e5850d:/scripts-new# BYPASS_SCHEMA_VERSION=true WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-30T03:05:35.777151 #245]  INFO -- : SELECT version FROM schema_migrations
[ClickHouse] NOTICE: Database has not been initialized yet.
[ClickHouse] INFO: There are 126 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: schema version check bypassed by BYPASS_SCHEMA_VERSION='true'
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@a21c67e5850d:/scripts-new# echo $?
1

As seen here, we start seeing that ClickHouse now has 126 pending migrations.

3. Database Setup With Pending Migrations for Postgres and ClickHouse

I ran all the regular migrations for Postgres and then rolled back 3 migrations.

For ClickHouse, I tried gitlab:clickhouse:migrate VERSION=... but that kept giving me an error. So, I moved some migrations out of the directory, and then ran all the migrations, and then moved the migrations back.

# Move Clickhouse migrations out
root@75337c7ebbb6:/scripts# mv /srv/gitlab/db/click_house/migrate/main/{20250808063619_create_hierarchy_audit_events.rb,20250808064130_create_hierarchy_audit_events_mv.rb} /srv/gitlab/db/click_house/schema_migrations/main/{20250808063619,20250808064130} /tmp

# ... run all the migrations ...

root@75337c7ebbb6:/scripts# /scripts/db-migrate

# ... move the ClickHouse migrations back.
root@75337c7ebbb6:/scripts# cp -v /tmp/{20250808063619_create_hierarchy_audit_events.rb,20250808064130_create_hierarchy_audit_events_mv.rb} /srv/gitlab/db/click_house/migrate/main/; cp -v /tmp/{20250808063619,20250808064130} /srv/gitlab/db/click_house/schema_migrations/main/
'/tmp/20250808063619_create_hierarchy_audit_events.rb' -> '/srv/gitlab/db/click_house/migrate/main/20250808063619_create_hierarchy_audit_events.rb'
'/tmp/20250808064130_create_hierarchy_audit_events_mv.rb' -> '/srv/gitlab/db/click_house/migrate/main/20250808064130_create_hierarchy_audit_events_mv.rb'
'/tmp/20250808063619' -> '/srv/gitlab/db/click_house/schema_migrations/main/20250808063619'
'/tmp/20250808064130' -> '/srv/gitlab/db/click_house/schema_migrations/main/20250808064130'

# ... Rollback Postgres by 3 steps
root@75337c7ebbb6:/scripts# /srv/gitlab/bin/rake -f /srv/gitlab/Rakefile db:rollback STEP=3

At this point, we have 3 pending Postgres migrations and 2 pending ClickHouse migrations.

3.1 Without BYPASS_SCHEMA_VERSION=true

As expected, the script fails and prints the appropriate pending migrations:

Output
root@75337c7ebbb6:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-21T13:31:36.282891 #2028]  INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
NOTICE: There are 3 pending migrations.
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@75337c7ebbb6:/scripts# echo $?
1

3.2 With BYPASS_SCHEMA_VERSION=true

In this case, the script will ignore the pending migrations and pass.

Output
root@75337c7ebbb6:/scripts# WAIT_FOR_TIMEOUT=2 BYPASS_SCHEMA_VERSION=true CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-21T13:32:07.424017 #2046]  INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
[ClickHouse] WARNING: schema version check bypassed by BYPASS_SCHEMA_VERSION='true'
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: schema version check bypassed by BYPASS_SCHEMA_VERSION='true'
NOTICE: There are 3 pending migrations.
root@75337c7ebbb6:/scripts# echo $?
0

4. Database Setup With Pending Migrations for ClickHouse Only

As an extension to the above case, we assume a case where all the migrations have been executed on Postgres but some are still pending in ClickHouse.

4.1 Default Case

By default, the script will pass and will not raise an error. This is to provide users with a smooth upgrade process.

If a ClickHouse-enabled GitLab user uses a CNG image with this check and either does not use the GitLab-Migrations chart or uses a version of the GitLab-Migrations chart without !2624 (merged), then their Sidekiq and Webservice pods will not start-up as it is possible that ClickHouse migrations were not run automatically for them. We intend to update the default to the value of the BYPASS_SCHEMA_VERSION variable after the required stop of %18.5. We discussed this plan in a comment thread of this MR.

Output
root@bcb872d996e7:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-27T06:07:55.829727 #1739]  INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
[ClickHouse] INFO: schema version check bypassed by BYPASS_CLICKHOUSE_SCHEMA_VERSION=''
Checking: main
Database Schema - main (gitlabhq_production)
root@bcb872d996e7:/scripts-new# echo $?
0

4.2 Explicit Bypass Disabled Case BYPASS_CLICKHOUSE_SCHEMA_VERSION='false'

If the user has explicitly enabled the check, then the script will fail.

Output
root@bcb872d996e7:/scripts-new# BYPASS_CLICKHOUSE_SCHEMA_VERSION='false' WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-27T06:07:46.894430 #1721]  INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 2 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@bcb872d996e7:/scripts-new# echo $?
1

5. Database Setup With All Migrations Executed

In this case, the script always passes.

root@75337c7ebbb6:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-21T13:36:51.523837 #2252]  INFO -- : SELECT version FROM schema_migrations
[ClickHouse] INFO: There are 0 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
root@75337c7ebbb6:/scripts# echo $?
0

6. Testing Against HTTPS ClickHouse Cloud

I tested against ClickHouse Cloud, which is served over HTTPS (as all my local tests were over HTTP). Some migrations had already been run on the ClickHouse Cloud instance, so the script correctly showed the number of pending migrations:

Output
root@41dc8933bb0c:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-22T08:31:00.429463 #69]  INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 5 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@41dc8933bb0c:/scripts# echo $?
1

After running migrations, the number went down to 0:

root@41dc8933bb0c:/scripts# /scripts/db-migrate
[snip]

Running gitlab:clickhouse:migrate:main rake task
== 20250813093423 CreateHierarchyMrTable: migrating ===========================
== 20250813093423 CreateHierarchyMrTable: migrated (0.3786s) ==================

== 20250813094645 CreateHierarchyMrMv: migrating ==============================
== 20250813094645 CreateHierarchyMrMv: migrated (0.2778s) =====================

== 20250813102642 AddTraversalIdsToSiphonIssues: migrating ====================
== 20250813102642 AddTraversalIdsToSiphonIssues: migrated (0.0908s) ===========

== 20250818064118 RecreateChatAiTrackingMv: migrating =========================
== 20250818064118 RecreateChatAiTrackingMv: migrated (0.0744s) ================

== 20250818064314 RecreateCodeSuggestionsAiTrackingMv: migrating ==============
== 20250818064314 RecreateCodeSuggestionsAiTrackingMv: migrated (0.1534s) =====

[snip]

root@41dc8933bb0c:/scripts# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-22T08:37:01.581399 #326]  INFO -- : SELECT version FROM schema_migrations
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: There are 0 migrations pending.
Checking: main
Database Schema - main (gitlabhq_production)

7. Error Case (ClickHouse DB Not Available)

Stop the ClickHouse container running locally:

$ docker container stop gitlab-rails-clickhouse-db-1
gitlab-rails-clickhouse-db-1

The check should print the appropriate error for ClickHouse and fail.

root@a21c67e5850d:/scripts-new# WAIT_FOR_TIMEOUT=2 CONFIG_DIRECTORY=/srv/gitlab/config ./wait-for-deps
Checking: resque.yml
[ClickHouse] INFO: Configuring ClickHouse DB main
+ SUCCESS connecting to 'redis://redis-service:6379' from resque.yml, through redis-service
[ClickHouse] INFO: Checking migration schema state for ClickHouse database main
[ClickHouse] INFO: ClickHouse - Database main
I, [2025-08-30T03:08:03.170085 #281]  INFO -- : SELECT version FROM schema_migrations
[ClickHouse] FATAL: Error while checking schema versions for ClickHouse main DB: Failed to open TCP connection to clickhouse-db:8123 (getaddrinfo: Temporary failure in name resolution)
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: Not all services were operational, with data migrations completed.
If this container continues to fail, please see: https://docs.gitlab.com/charts/troubleshooting/index.html#application-containers-constantly-initializing
root@a21c67e5850d:/scripts-new# echo $?
1

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion

Required

  • Merge Request Title, and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com
  • When ready for review, MR is labeled "~workflow::ready for review" per the Distribution MR workflow

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes
  • Documentation created/updated
  • Integration tests added to GitLab QA
    • Not required, I think
  • The impact any change in container size has should be evaluated
    • No impact
  • New dependencies are managed with GitLab forked renovatebot
    • No new dependencies
Edited by Siddharth Kannan

Merge request reports

Loading