Geo: adapt verification timed out query to use state table

Merged Aakriti Gupta requested to merge ag-fix-verification-state-upload-bug into master

What does this MR do and why?

Geo stores verification state of replicable models in the model table or in a separate state table.

The concern Geo::VerificationState handles both the cases, by adapting the queries it runs to pick which table should be queried depending on which case is used.

One of the queries was still updating the model table even in cases where a separate state table was in use. This caused an error, see below.

This MR, updates the common scope used verification_timed_out such that the correct table name is picked up for updating.

Resolves: #349281 (closed)

Screenshots or screen recordings

The update_all query has been changed. It was called from Gitlab::Geo::VerificationState#fail_verification_timeouts

Case 1: For models using a separate table for storing verification state, e.g. LfsObject using LfsObjectState, the query is changed - the state table is updated.

Before

Error

ActiveRecord::StatementInvalid: PG::UndefinedColumn: ERROR:  column "verification_state" of relation "lfs_objects" does not exist
LINE 1: UPDATE "lfs_objects" SET "verification_state" = 3, "verifica...
                                 ^

from /Users/aakriti/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `exec_params'
Caused by PG::UndefinedColumn: ERROR:  column "verification_state" of relation "lfs_objects" does not exist
LINE 1: UPDATE "lfs_objects" SET "verification_state" = 3, "verifica...
                                 ^

from /Users/aakriti/.asdf/installs/ruby/2.7.5/lib/ruby/gems/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `exec_params'
[5] pry(main)>

Query

UPDATE
  "lfs_objects"
SET
  "verification_state" = 3,
  "verification_failure" = 'Verification timed out after 28800',
  "verification_checksum" = NULL,
  "verification_retry_count" = 1,
  "verification_retry_at" = '2021-12-29 18:14:02.360569',
  "verified_at" = '2021-12-29 18:13:04.360691'
WHERE
  "lfs_objects"."id" IN (
    SELECT
      "lfs_objects"."id"
    FROM
      "lfs_objects"
      INNER JOIN "lfs_object_states" ON "lfs_object_states"."lfs_object_id" = "lfs_objects"."id"
    WHERE
      "lfs_object_states"."verification_state" = 1
      AND (verification_started_at < '2021-12-29 10:13:04.361232')
      AND "lfs_objects"."id" >= 13)

After

UPDATE
  "lfs_object_states"
SET
  "verification_state" = 3,
  "verification_failure" = 'Verification timed out after 28800',
  "verification_checksum" = NULL,
  "verification_retry_count" = 1,
  "verification_retry_at" = '2021-12-29 18:12:35.003584',
  "verified_at" = '2021-12-29 18:11:27.003670'
WHERE
  "lfs_object_states"."lfs_object_id" IN (
    SELECT
      "lfs_objects"."id"
    FROM
      "lfs_objects"
      INNER JOIN "lfs_object_states" ON "lfs_object_states"."lfs_object_id" = "lfs_objects"."id"
    WHERE
      "lfs_object_states"."verification_state" = 1
      AND "lfs_object_states"."verification_started_at" < '2021-12-29 10:11:27.004924')
  AND "lfs_object_states"."lfs_object_id" >= 13

Case 2: For models storing verification state in the same table as the model, e.g. Terraform::StateVersion, the query remains unchanged. The model table is updated.

Before

UPDATE
  "terraform_state_versions"
SET
  "verification_state" = 3,
  "verification_failure" = 'Verification timed out after 28800',
  "verification_checksum" = NULL,
  "verification_retry_count" = 1,
  "verification_retry_at" = '2021-12-29 18:14:42.667748',
  "verified_at" = '2021-12-29 18:14:04.667816'
WHERE
  "terraform_state_versions"."file_store" = 1
  AND ("terraform_state_versions"."verification_state" IN (1))
  AND (verification_started_at < '2021-12-29 10:14:04.668346')
  AND "terraform_state_versions"."id" >= 4

After

UPDATE
  "terraform_state_versions"
SET
  "verification_state" = 3,
  "verification_failure" = 'Verification timed out after 28800',
  "verification_checksum" = NULL,
  "verification_retry_count" = 1,
  "verification_retry_at" = '2021-12-29 18:10:25.912174',
  "verified_at" = '2021-12-29 18:09:29.912256'
WHERE
  "terraform_state_versions"."file_store" = 1
  AND ("terraform_state_versions"."verification_state" IN (1))
  AND "terraform_state_versions"."verification_started_at" < '2021-12-29 10:09:29.912654'
  AND "terraform_state_versions"."id" >= 4

How to set up and validate locally

Case 1: Separately stored verification state

  1. Create a LfsObject object, e.g.:
    f = Pathname.new(Rails.root.join("~/gdk/file1.txt")).open
    LfsObject.new(oid: "b68143e6463773b1b6c6fd009a76c32aeec041faff32ba2ed42fd267785", size: 499013, file: f, file_store: 1).save!
  2. Update the automatically created LfsObjectState object so that it is picked up as a timed out verification:
    l_state = Geo::LfsObjectState.last
    l_state.update_attribute(:verification_state, 1)
    l_state.update_attribute(:verification_started_at, (Gitlab::Geo::VerificationState::VERIFICATION_TIMEOUT + 3.hours).ago)
    l_state.save!
  3. ::LfsObject.fail_verification_timeouts
  4. ::LfsObject.last.verification_failed? should be true

Case 2: Verification state stored in the model table

  1. Create a Terraform::State object: Terraform::State.new(project_id: 18, name: "terraform-state").save!
  2. Create a corresponding Terraform::StateVersion object:
    f = Pathname.new(Rails.root.join("/Users/aakriti/Development/gdk/gitlab/file.txt")).open
    Terraform::StateVersion.new(terraform_state_id: 1, version: 1, file_store: 1, file: f).save!
    t = Terraform::StateVersion.last
  3. Update the new object so that it is picked up as a timed out verification:
    t.update_attribute(:verification_state, 1)
    t.update_attribute(:verification_started_at, (Gitlab::Geo::VerificationState::VERIFICATION_TIMEOUT + 3.hours).ago)
    t.save!
  4. Terraform::StateVersion.fail_verification_timeouts
  5. Terraform::StateVersion.last.verification_failed? should be true

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Aakriti Gupta