Unable to skip ClickHouse migrations
Summary
Skipping the ClickHouse migrations via an environment variable, introduced gitlab-org/build/CNG!2682 (merged), still prevents GitLab Rails pods from starting when the ClickHouse database doesn't exist yet.
This was originally required for Dedicated, as the ClickHouse database is only created after the initial installation of the chart.
See Schema version check for ClickHouse is preventi... (gitlab-com/gl-infra/delivery#21637 - closed) for additional context.
When SKIP_CLICKHOUSE_SCHEMA_VERSION_CHECK is passed as environment variable via the migrations chart, the migration k8s Job is failing:
Defaulted container "migrations" out of: migrations, certificates (init), configure (init)
Begin parsing .erb templates from /var/opt/gitlab/templates
Writing /srv/gitlab/config/cable.yml
Writing /srv/gitlab/config/click_house.yml
Writing /srv/gitlab/config/database.yml
Writing /srv/gitlab/config/gitlab.yml
Writing /srv/gitlab/config/resque.yml
Begin parsing .tpl templates from /var/opt/gitlab/templates
Copying other config files found in /var/opt/gitlab/templates to /srv/gitlab/config
Attempting to run '/bin/bash -c set -e;
/scripts/wait-for-deps;
/scripts/db-migrate;
' as a main process
[ClickHouse] WARN: ClickHouse migration check explicitly skipped. This is NOT recommended for production environments.
Checking: resque.yml, cable.yml
+ SUCCESS connecting to 'rediss://master.sbclickhouse-redis.qiymyz.euc1.cache.amazonaws.com:6379' from resque.yml, through master.sbclickhouse-redis.qiymyz.euc1.cache.amazonaws.com
+ SUCCESS connecting to 'rediss://master.sbclickhouse-redis.qiymyz.euc1.cache.amazonaws.com:6379' from cable.yml, through master.sbclickhouse-redis.qiymyz.euc1.cache.amazonaws.com
Checking: main
Database Schema - main (gitlabhq_production)
WARNING: schema version check bypassed by BYPASS_SCHEMA_VERSION='true'
Checking database migrations are up-to-date
Performing migrations (this will initialized if needed)
Skipping Topology Service health check due to the cell being disabled
Running db:migrate rake task
main: == [advisory_lock_connection] object_id: 77240, pg_backend_pid: 17387
main: == [advisory_lock_connection] object_id: 77240, pg_backend_pid: 17387
Running gitlab:clickhouse:migrate:main rake task
rake aborted!
ClickHouse::Client::DatabaseError: Code: 516. DB::Exception: gitlab: Authentication failed: password is incorrect, or there is no user with such name. (AUTHENTICATION_FAILED) (version 25.8.1.8702 (official build))
/srv/gitlab/vendor/bundle/ruby/3.2.0/gems/click_house-client-0.8.0/lib/click_house/client.rb:131:in `block in instrumented_execute'
/srv/gitlab/vendor/bundle/ruby/3.2.0/gems/activesupport-7.1.5.2/lib/active_support/notifications.rb:206:in `block in instrument'
/srv/gitlab/vendor/bundle/ruby/3.2.0/gems/activesupport-7.1.5.2/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
/srv/gitlab/vendor/bundle/ruby/3.2.0/gems/activesupport-7.1.5.2/lib/active_support/notifications.rb:206:in `instrument'
/srv/gitlab/vendor/bundle/ruby/3.2.0/gems/click_house-client-0.8.0/lib/click_house/client.rb:119:in `instrumented_execute'
/srv/gitlab/vendor/bundle/ruby/3.2.0/gems/click_house-client-0.8.0/lib/click_house/client.rb:48:in `select'
/srv/gitlab/lib/click_house/connection.rb:13:in `select'
/srv/gitlab/lib/click_house/connection.rb:64:in `table_exists?'
/srv/gitlab/lib/click_house/migration_support/schema_migration.rb:12:in `ensure_table'
/srv/gitlab/lib/tasks/gitlab/click_house/migration.rake:137:in `migrate'
/srv/gitlab/lib/tasks/gitlab/click_house/migration.rake:43:in `block (5 levels) in <main>'
/srv/gitlab/lib/tasks/gitlab/click_house/migration.rake:60:in `block (4 levels) in <main>'
/srv/gitlab/lib/tasks/gitlab/click_house/migration.rake:58:in `each'
/srv/gitlab/lib/tasks/gitlab/click_house/migration.rake:58:in `block (3 levels) in <main>'
/srv/gitlab/lib/tasks/gitlab/db.rake:176:in `configure_clickhouse_databases'
/srv/gitlab/lib/tasks/gitlab/db.rake:103:in `block (3 levels) in <main>'
Tasks: TOP => gitlab:clickhouse:migrate:main
(See full trace by running task with --trace)
With the help of @WarheadsSE, we traced this back to https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/tasks/gitlab/click_house/migration.rake#L37 not allowing to skip the actual migration.
Impact
- GitLab Dedicated has to onboard new tenants without ClickHouse enabled
- Utilizing the env variable
SKIP_CLICKHOUSE_SCHEMA_VERSION_CHECKresults in a failing Job - leaving the instance in a state where we are not sure if it's safe to proceed - Already up and running GitLab Dedicated tenants are not affected
Recommendation
Allow ClickHouse migrations to be skipped without failure.
Verification
- Deploy a new GitLab Dedicated tenant via Switchboard
- Make the following changes to the tenant model:
{
"clickhouse": {
"enabled": true
},
"chart_extra_settings": {
"webservice": {
"extraEnv": {
"SKIP_CLICKHOUSE_SCHEMA_VERSION_CHECK": "YesReally"
}
},
"migrations": {
"extraEnv": {
"SKIP_CLICKHOUSE_SCHEMA_VERSION_CHECK": "YesReally"
}
}
}
}
- Deploy the tenant and once the migration chart is installed, check the output of the Pod