Geo 14.6.0 error: Column "verification_state" of relation "pages_deployments" does not exist
Summary
After upgrade to 14.6.0 some Geo customers may face the following errors repeating every minute in their Sidekiq log:
PG::UndefinedColumn: ERROR: column "verification_state" of relation "pages_deployments" does not exist
LINE 1: ...config_name:main*/ UPDATE "pages_deployments" SET "verificat..."
The relation may also vary between pages_deployments
and lfs_objects
or uploads
Steps to reproduce
Reproducing this requires at least one Pages verification job that has exceeded its 8 hour runtime without failing/succeeding.
Example Project
This is not project specific.
What is the current bug behavior?
VerificationTimeoutWorker
fails with an error when it encounters a timed out Pages, LFS Object, or Upload related verification job and is unable to mark these jobs as timed out.
What is the expected correct behavior?
VerificationTimeoutWorker
does not fail with an error when marking some verification jobs as timed out.
Relevant logs and/or screenshots
{
"severity": "WARN",
"time": "2021-12-24T10:27:16.017Z", …
"queue": "geo:geo_verification_timeout",
"class": "Geo::VerificationTimeoutWorker",
"args": [
"pages_deployment"
],
"meta.caller_id": "Geo::VerificationCronWorker",
"meta.feature_category": "geo_replication",
"message": "Geo::VerificationTimeoutWorker JID-a947ff98b4813459b4319223: fail: 0.7699 sec",
"job_status": "fail",
"error_message": "PG::UndefinedColumn: ERROR: column \"verification_state\" of relation \"pages_deployments\" does not exist\nLINE 1: ...config_name:main*/ UPDATE \"pages_deployments\" SET \"verificat...\n ^\n",
"error_class": "ActiveRecord::StatementInvalid",
"error_backtrace": [
"lib/gitlab/database/load_balancing/connection_proxy.rb:126:in `block in write_using_load_balancer'",
"lib/gitlab/database/load_balancing/load_balancer.rb:112:in `block in read_write'",
"lib/gitlab/database/load_balancing/load_balancer.rb:172:in `retry_with_backoff'",
"lib/gitlab/database/load_balancing/load_balancer.rb:110:in `read_write'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:125:in `write_using_load_balancer'",
"lib/gitlab/database/load_balancing/connection_proxy.rb:67:in `block (2 levels) in <class:ConnectionProxy>'",
"ee/lib/gitlab/geo/verification_state.rb:246:in `block in fail_verification_timeouts'",
"app/models/concerns/each_batch.rb:98:in `block (2 levels) in each_batch'",
"app/models/concerns/each_batch.rb:98:in `block in each_batch'",
"app/models/concerns/each_batch.rb:68:in `step'",
"app/models/concerns/each_batch.rb:68:in `each_batch'",
"ee/lib/gitlab/geo/verification_state.rb:245:in `fail_verification_timeouts'",
"ee/app/models/concerns/geo/verifiable_replicator.rb:20:in `fail_verification_timeouts'",
"ee/app/workers/geo/verification_timeout_worker.rb:17:in `perform'", …
],
}
Output of checks
This bug happens on GitLab 14.6.0 self-managed instances using Geo replication features.
Possible fixes
Related 14.6.0 MRs:
The issue occurs due to an incorrect relation used when attempting to adjust the value for verification_state
while marking a verification job as failed due to time-out.
The approach in the MRs linked above that cause this error took a different approach than their predecessors such as !60935 (merged)
The difference was that in !60935 (merged) the verification_state
and associated columns were ALTERed into the base table of their model, whereas in the above MRs they were introduced as new tables via a 1:1 relation to the base table.
The code used when seeking out the entries to mark as timeout was not changed to account for this relationship difference:
- https://gitlab.com/gitlab-org/gitlab/blob/52f68e9cb7771ad636351a07483ca8c0ff59c5ce/ee/lib/gitlab/geo/verification_state.rb#L38
- https://gitlab.com/gitlab-org/gitlab/blob/52f68e9cb7771ad636351a07483ca8c0ff59c5ce/ee/lib/gitlab/geo/verification_state.rb#L234-248
As a result, in the following snippet when running for Pages data, the relation
fetched is PagesDeployment
instead of the expected PagesDeploymentState
, and PagesDeployment
cannot perform an update_all
with attributes referencing column verification_state
which resides in a different relation PagesDeploymentState
:
verification_timed_out.each_batch do |relation|
relation.update_all(attrs)
end
See also premium customer ticket https://gitlab.zendesk.com/agent/tickets/258741 (internal link)