Ci::DeleteObjectsWorker fails with PG::CheckViolation: ERROR: new row for relation "ci_deleted_objects" violates check constraint "check_98f90d6c53"
Summary
GitLab 17.4.0 update introduced several database migrations, including adding "check_98f90d6c53":
execute("ALTER TABLE ci_deleted_objects\nADD CONSTRAINT check_98f90d6c53\nCHECK ( project_id IS NOT NULL )\nNOT VALID;\n")
this check was added with "NOT VALID" clause - it is not applied to existing data, only new data is checked, including updates.
Sidekiq Ci::DeleteObjectsWorker however does UPDATE on this table as part of his job. SQL query:
UPDATE "ci_deleted_objects" SET "pick_up_at" = $1 WHERE... which changes only pick_up_at, but fails on existing data in database
with project_id = NULL. This leads to huge spam in sentry, as this Worker runs every 16 minutes, and increased use of storage, as workers which would usually pick up artifacts for deletion now crash every time.
Relevant logs and/or screenshots
https://sentry.ict.fit.cvut.cz/share/issue/bf332fbb596044f481b1f0a78e645b0d/
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Ubuntu 22.04 Proxy: no Current User: git Using RVM: no Ruby Version: 3.1.5p253 Gem Version: 3.5.17 Bundler Version:2.5.11 Rake Version: 13.0.6 Redis Version: 7.0.15 Sidekiq Version:7.2.4 Go Version: unknown GitLab information Version: 17.4.1-ee Revision: 40bdc966046 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 14.11 URL: https://gitlab.fit.cvut.cz HTTP Clone URL: https://gitlab.fit.cvut.cz/some-group/some-project.git SSH Clone URL: git@gitlab.fit.cvut.cz:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: yes Using Omniauth: no GitLab Shell Version: 14.39.0 Repository storages: - default: unix:/var/opt/gitlab/gitaly/gitaly.socket GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Gitaly - default Address: unix:/var/opt/gitlab/gitaly/gitaly.socket - default Version: 17.4.1 - default Git Version: 2.46.0
Update
It seems that fix was applied in a migration which is executed before the "broken" migration, so the bug is still present on some environments which updated directly to version with fix and didn't go through broken version first.
Workaround
Execute gitlab-rake db:migrate:up:ci RAILS_ENV=production VERSION=20241028085044 to update the invalid records and then re-run the migrations. (if you receive an error while executing this command for invalid rake tasks then opt for remove the constraint instead as indicated below)