Upgrading self-managed Gitlab to v18.5 leads to a failing DB migration when there is an instance-level Slack integration
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
Upgrading self-managed Gitlab CE to v18.5 leads to a failing DB migration when there is an instance-level Slack integration.
Steps to reproduce
- Have a self-managed Gitlab v18.4 CE instance, which was previously affected by the bug described in !202790 (merged) - that is - it has (a now disabled) instance-level Slack integration.
- Initiate an omnibus package upgrade of
gitlab-ceto v18.5.
Example Project
N/A - need a complete Gitlab instance to demonstrate.
What is the current bug behavior?
Downtime of the Gitlab instance - sudo gitlab-ctl status shows most services as down.
There is a DB migration which fails during e.g. gitlab-ctl reconfigure, which also prevents further needed migrations for the new v18.5, leading to an unusable Gitlab with most services in the down state.
What is the expected correct behavior?
sudo gitlab-ctl status shows all services up and running
Relevant logs and/or screenshots
Most services are down:
sudo gitlab-ctl status
down: alertmanager: 46771s, normally up; run: log: (pid 922) 1861459s
down: crond: 46771s, normally up; run: log: (pid 913) 1861459s
run: gitaly: (pid 3969406) 46668s; run: log: (pid 920) 1861459s
down: gitlab-exporter: 46770s, normally up; run: log: (pid 938) 1861459s
down: gitlab-workhorse: 46770s, normally up; run: log: (pid 912) 1861459s
down: logrotate: 46769s, normally up; run: log: (pid 905) 1861459s
down: nginx: 46769s, normally up; run: log: (pid 923) 1861459s
down: node-exporter: 46769s, normally up; run: log: (pid 897) 1861459s
down: postgres-exporter: 46768s, normally up; run: log: (pid 907) 1861459s
run: postgresql: (pid 15276) 1861060s; run: log: (pid 921) 1861459s
down: prometheus: 46768s, normally up; run: log: (pid 900) 1861459s
down: puma: 46765s, normally up; run: log: (pid 903) 1861459s
run: redis: (pid 926) 1861459s; run: log: (pid 910) 1861459s
down: redis-exporter: 46765s, normally up; run: log: (pid 898) 1861459s
down: registry: 46764s, normally up; run: log: (pid 924) 1861459s
down: sidekiq: 46760s, normally up; run: log: (pid 902) 1861459s
Discovered a failing DB (post) migration (e.g. try to do gitlab-ctl reconfigure and you get the error):
...
bash_hide_env[migrate gitlab-rails database] action run
[execute] Skipping Topology Service health check due to the cell being disabled
Running db:migrate rake task
main: == [advisory_lock_connection] object_id: 68660, pg_backend_pid: 4052022
main: == 20250922093672 IntegrationsValidateMultipleColumnNotNullConstraint: migrating
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0009s
main: -- execute("ALTER TABLE integrations VALIDATE CONSTRAINT check_2aae034509;")
main: -- execute("RESET statement_timeout")
main: == [advisory_lock_connection] object_id: 68660, pg_backend_pid: 4052022
rake aborted!
StandardError: An error has occurred, this and all later migrations canceled:
PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block
...
ActiveRecord::StatementInvalid: PG::CheckViolation: ERROR: check constraint "check_2aae034509" of relation "integrations" is violated by some row
The failing migration is added with !204744 (merged) - IntegrationsValidateMultipleColumnNotNullConstraint in db/post_migrate/20250922093672_integrations_validate_multiple_column_not_null_constraint.rb
.
The MR from above appears to try to validate the presence of some combinations of attributes (sharding keys) for the integrations DB table.
Connect to the DB:
sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql -d gitlabhq_production
Track down violators - we want exactly 1 non-null from 3 columns - project_id, group_id, organization_id:
- check for both
group_id+organization_idbeing non-null - 0 resultsSELECT * FROM integrations WHERE group_id IS NOT NULL AND organization_id IS NOT NULL;
- check for both
project_id+organization_idbeing non-null - 0 resultsSELECT * FROM integrations WHERE project_id IS NOT NULL AND organization_id IS NOT NULL;
- check for both
group_id+project_idbeing non-null - 0 resultsSELECT * FROM integrations WHERE group_id IS NOT NULL AND project_id IS NOT NULL;
-
check for all 3 being null - HAS 1 VIOLATOR FOR SLACK INTEGRATION:
-
SELECT id, created_at, updated_at, active, category, instance, inherit_from_id, type_new FROM integrations WHERE group_id IS NULL AND project_id IS NULL AND organization_id IS NULL;6 | 2025-06-02 09:50:48.681851 | 2025-09-03 09:04:21.757086 | f | chat | t | | Integrations::GitlabSlackApplication
-
Fixed by deleting the violating record:
DELETE FROM integrations WHERE group_id IS NULL AND project_id IS NULL AND organization_id IS NULL;
Then running sudo gitlab-ctl reconfigure passed OK.
Output of checks
Results of GitLab environment info
Added output of sudo gitlab-rake gitlab:env:info below
Expand for output related to GitLab environment info
sudo gitlab-rake gitlab:env:info System information System: Debian 12 Current User: git Using RVM: no Ruby Version: 3.2.8 Gem Version: 3.7.1 Bundler Version:2.7.1 Rake Version: 13.0.6 Redis Version: 7.2.10 Sidekiq Version:7.3.9 Go Version: unknown GitLab information Version: 18.5.0 Revision: a2f69d15eba Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 16.10 URL: REDACTED HTTP Clone URL: REDACTED SSH Clone URL: REDACTED Using LDAP: no Using Omniauth: yes Omniauth Providers: google_oauth2 GitLab Shell Version: 14.45.3 Repository storages: - default: unix:/var/opt/gitlab/gitaly/gitaly.socket GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Gitaly - default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket - default Version: 18.5.0 - default Git Version: 2.50.1
Results of GitLab application Check
Did NOT manage to run gitlab:check before resolving the issue.
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of: \\\`sudo gitlab-rake gitlab:check SANITIZE=true\\\`) (For installations from source run and paste the output of: \\\`sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true\\\`) (we will only investigate if the tests are passing)
Workarounds
Manually connect to the DB and delete the violating record for the Slack integration:
-
sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql -d gitlabhq_productionfor omnibus self-managed on a single node DELETE FROM integrations WHERE group_id IS NULL AND project_id IS NULL AND organization_id IS NULL;
Possible fixes
Not sure what the root cause is, need to analyze prior linked issued. Possibly a bad assumption in !204744 (merged) that there are no cases with all 3 attributes being NULL.
Patch release information for backports
If the bug fix needs to be backported in a patch release to a version under the maintenance policy, please follow the steps on the patch release runbook for GitLab engineers.
Refer to the internal "Release Information" dashboard for information about the next patch release, including the targeted versions, expected release date, and current status.
High-severity bug remediation
To remediate high-severity issues requiring an internal release for single-tenant SaaS instances, refer to the internal release process for engineers.