Skip to content

Improve performance of verification queries

Aakriti Gupta requested to merge ag-adapt-verification-queries into master

What does this MR do and why?

Resolves: #338598 (closed)

Use of caffeine recommended for review.

This MR makes it so that the performance of verification queries is improved for the cases where the verification details of a replicable are stored in a separate table, e.g. for External Merge Request Diffs. This is done in the backend by splitting the common scope used in verification queries, available_verifiables into two partially overlapping scopes, verifiables and available_verifiables.

available_verifiables was the scope declared in VerifiableRegistry and ReplicableModel. It was used such that it included all available_replicables with added filters where applicable, e.g. whether files were stored externally in the case of External Merge Request Diff.

Essentially, verifiables are those records in the replicable's table that should be checksummed on primary or have been replicated on secondary. But available_verifiables are those replicable records that contain verification details and this scope can be directly queried upon on the verification details table in e.g., the VerificationState queries.

In replicables where verification state is stored in a separate table, available_verifiables is overriden in the replicable model to join on the separate table, e.g. MergeRequestDiffDetail for the replicable model MergeRequestDiff.

In others, the scope applies to the replicable model itself, e.g. GroupWikiRepository and is equivalent to the scope verifiables.

At the database level, the queries are altered only for datatypes like merge_request_diffs where a corresponding table merge_Request_diff_details is used. The queries are adapted such that, some filters are no longer needed. These filters (stored_externally, e.g.) were used when creating records in the merge_request_diff_details table, so querying with them is redundant. This MR removes these extra filters from the queries.

A note on the status of mergability of this MR

Update: the MR is now mergable.

I have removed the temporarily quarantined QA tests. There were still broken QA tests. Instead of testing out the deb package, I can verify this MR out on staging, once it is merged and deployed.

The MR is already verified on a local setup.

The last few commits are temporary. They quarantine some failing QA tests. This is done to get a deb package out of the package-and-qa pipeline to test this MR on a GET setup.

This commit will eventually be removed.

I will then update the Draft status of the MR to make it mergeable.

For database review - modified queries

The queries were run on a local gdk setup because the merge_request_diff_details table is not filled on production (Geo is not enabled) and on staging, it has very few records.

Please see comment for query plans.


For backend review - SQL for Verification queries: before and after

Case 1: Using separate verification table for MergeRequestDiff

Only start_verification_batch query is changed.

Click to see queries
start_verification_batch 
--------------------------
# BEFORE

UPDATE merge_request_diff_details
SET "verification_state" = 1,
    "verification_started_at" = NOW()
WHERE merge_request_diff_id IN
    (SELECT merge_request_diff_id
     FROM "merge_request_diffs"
     INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
     WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
                                                  'empty'))
       AND "merge_request_diffs"."stored_externally" = TRUE
       AND "merge_request_diffs"."external_diff_store" = 1
       AND "merge_request_diff_details"."verification_state" = 3
       AND ("merge_request_diff_details"."verification_retry_at" IS NULL
            OR "merge_request_diff_details"."verification_retry_at" < '2021-10-29 16:34:32.359269')
     ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
     FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id

# AFTER
  UPDATE merge_request_diff_details
  SET "verification_state" = 1,
      "verification_started_at" = NOW() WHERE merge_request_diff_id IN
    (SELECT merge_request_diff_id
     FROM "merge_request_diffs"
     INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
     WHERE "merge_request_diff_details"."verification_state" = 0
     ORDER BY verified_at ASC NULLS FIRST LIMIT 10
     FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id 

Verification pending 
-------------------
# BEFORE

SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
                                             'empty'))
  AND "merge_request_diffs"."stored_externally" = TRUE
  AND "merge_request_diffs"."external_diff_store" = 1
  AND "merge_request_diff_details"."verification_state" = 0
ORDER BY verified_at ASC NULLS FIRST LIMIT 10

# AFTER
  SELECT "merge_request_diffs".*
  FROM "merge_request_diffs"
  INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id" WHERE "merge_request_diff_details"."verification_state" = 0
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 

Verification failed 
-------------------
# BEFORE

SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
                                             'empty'))
  AND "merge_request_diffs"."stored_externally" = TRUE
  AND "merge_request_diffs"."external_diff_store" = 1
  AND "merge_request_diff_details"."verification_state" = 3
  AND ("merge_request_diff_details"."verification_retry_at" IS NULL
       OR "merge_request_diff_details"."verification_retry_at" < '2021-10-29 16:34:32.359269')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9

# AFTER
SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE "merge_request_diff_details"."verification_state" = 3
  AND ("merge_request_diff_details"."verification_retry_at" IS NULL
       OR "merge_request_diff_details"."verification_retry_at" < '2021-10-29 16:19:16.185384')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9 

---------------------
**Registry Queries (all unchanged) **
---------------------

start_verification_batch 
--------------
# BEFORE
UPDATE merge_request_diff_registry
SET "verification_state" = 1,
    "verification_started_at" = NOW()
WHERE merge_request_diff_id IN
    (SELECT "merge_request_diff_registry"."merge_request_diff_id"
     FROM "merge_request_diff_registry"
     WHERE ("merge_request_diff_registry"."state" IN (2))
       AND ("merge_request_diff_registry"."verification_state" IN (0))
     ORDER BY verified_at ASC NULLS FIRST LIMIT 10
     FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id 

# AFTER
UPDATE merge_request_diff_registry
SET "verification_state" = 1,
    "verification_started_at" = NOW()
WHERE merge_request_diff_id IN
    (SELECT "merge_request_diff_registry"."merge_request_diff_id"
     FROM "merge_request_diff_registry"
     WHERE ("merge_request_diff_registry"."state" IN (2))
       AND ("merge_request_diff_registry"."verification_state" IN (0))
     ORDER BY verified_at ASC NULLS FIRST LIMIT 10
     FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id 

Verification pending 
-------------------
# BEFORE
SELECT "merge_request_diff_registry".*
FROM "merge_request_diff_registry"
WHERE ("merge_request_diff_registry"."state" IN (2))
  AND ("merge_request_diff_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 

# AFTER

SELECT "merge_request_diff_registry".*
FROM "merge_request_diff_registry"
WHERE ("merge_request_diff_registry"."state" IN (2))
  AND ("merge_request_diff_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 

Verification failed
 -------------------
# BEFORE
  SELECT "merge_request_diff_registry".*
  FROM "merge_request_diff_registry" WHERE ("merge_request_diff_registry"."state" IN (2))
  AND ("merge_request_diff_registry"."verification_state" IN (3))
  AND ("merge_request_diff_registry"."verification_retry_at" IS NULL
       OR "merge_request_diff_registry"."verification_retry_at" < '2021-10-29 16:34:33.000263')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9

# AFTER
  SELECT "merge_request_diff_registry".*
  FROM "merge_request_diff_registry" WHERE ("merge_request_diff_registry"."state" IN (2))
  AND ("merge_request_diff_registry"."verification_state" IN (3))
  AND ("merge_request_diff_registry"."verification_retry_at" IS NULL
       OR "merge_request_diff_registry"."verification_retry_at" < '2021-10-29 16:42:01.805050')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9

Case 2: Verification details in model table, Terraform::StateVersion

All queries are unchanged.

Click to expand
start_verification_batch 
--------------------------

# BEFORE
UPDATE terraform_state_versions
SET "verification_state" = 1,
    "verification_started_at" = NOW()
WHERE id IN
    (SELECT "terraform_state_versions"."id"
     FROM "terraform_state_versions"
     WHERE "terraform_state_versions"."file_store" = 1
       AND ("terraform_state_versions"."verification_state" IN (0))
     ORDER BY verified_at ASC NULLS FIRST LIMIT 10
     FOR UPDATE SKIP LOCKED) RETURNING id 

# AFTER
  UPDATE terraform_state_versions
  SET "verification_state" = 1,
      "verification_started_at" = NOW() WHERE id IN
    (SELECT "terraform_state_versions"."id"
     FROM "terraform_state_versions"
     WHERE "terraform_state_versions"."file_store" = 1
       AND ("terraform_state_versions"."verification_state" IN (0))
     ORDER BY verified_at ASC NULLS FIRST LIMIT 10
     FOR UPDATE SKIP LOCKED) RETURNING id 

Verification pending 
-------------------
# BEFORE

SELECT "terraform_state_versions".*
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
  AND ("terraform_state_versions"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 

# AFTER
SELECT "terraform_state_versions".*
FROM "terraform_state_versions" 
WHERE "terraform_state_versions"."file_store" = 1
  AND ("terraform_state_versions"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 

Verification failed 
-------------------

# BEFORE
SELECT "terraform_state_versions".*
FROM "terraform_state_versions" 
WHERE "terraform_state_versions"."file_store" = 1
  AND ("terraform_state_versions"."verification_state" IN (3))
  AND ("terraform_state_versions"."verification_retry_at" IS NULL
       OR "terraform_state_versions"."verification_retry_at" < '2021-10-29 16:39:46.733845')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9

# AFTER
SELECT "terraform_state_versions".*
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
  AND ("terraform_state_versions"."verification_state" IN (3))
  AND ("terraform_state_versions"."verification_retry_at" IS NULL
       OR "terraform_state_versions"."verification_retry_at" < '2021-10-29 16:23:32.369965')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9 

---------------------
** Registry Queries**
---------------------

start_verification_batch 
-------------------------

# BEFORE
UPDATE terraform_state_version_registry
SET "verification_state" = 1,
    "verification_started_at" = NOW()
WHERE terraform_state_version_id IN
    (SELECT "terraform_state_version_registry"."terraform_state_version_id"
     FROM "terraform_state_version_registry"
     WHERE ("terraform_state_version_registry"."state" IN (2))
       AND ("terraform_state_version_registry"."verification_state" IN (3))
       AND ("terraform_state_version_registry"."verification_retry_at" IS NULL
            OR "terraform_state_version_registry"."verification_retry_at" < '2021-10-29 16:31:56.060168')
     ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
     FOR UPDATE SKIP LOCKED) RETURNING terraform_state_version_id

# AFTER
UPDATE terraform_state_version_registry
SET "verification_state" = 1,
    "verification_started_at" = NOW()
WHERE terraform_state_version_id IN
    (SELECT "terraform_state_version_registry"."terraform_state_version_id"
     FROM "terraform_state_version_registry"
     WHERE ("terraform_state_version_registry"."state" IN (2))
       AND ("terraform_state_version_registry"."verification_state" IN (0))
     ORDER BY verified_at ASC NULLS FIRST LIMIT 10
     FOR UPDATE SKIP LOCKED) RETURNING terraform_state_version_id 

Verification pending 
-------------------
# BEFORE
SELECT "terraform_state_version_registry".*
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
  AND ("terraform_state_version_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 start_verification_batch


# AFTER
SELECT "terraform_state_version_registry".*
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
  AND ("terraform_state_version_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 

Verification failed 
-------------------
# BEFORE
SELECT "terraform_state_version_registry".*
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
  AND ("terraform_state_version_registry"."verification_state" IN (3))
  AND ("terraform_state_version_registry"."verification_retry_at" IS NULL
       OR "terraform_state_version_registry"."verification_retry_at" < '2021-10-29 16:31:56.060168')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9

# AFTER
  SELECT "terraform_state_version_registry".*
  FROM "terraform_state_version_registry" WHERE ("terraform_state_version_registry"."state" IN (2))
  AND ("terraform_state_version_registry"."verification_state" IN (3))
  AND ("terraform_state_version_registry"."verification_retry_at" IS NULL
       OR "terraform_state_version_registry"."verification_retry_at" < '2021-10-29 16:45:45.624111')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9 

Queries run by Geo::VerificationStateBackfillService from VerificationState

Includes use of the new verifiables scope.

Click to expand
pluck_verifiable_ids_in_range (using the scope `verifiables`)
------------------------------

SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
                                             'empty'))
  AND "merge_request_diffs"."stored_externally" = TRUE
  AND "merge_request_diffs"."external_diff_store" = 1
  AND "merge_request_diffs"."id" BETWEEN 1 AND 1227 

pluck_verification_details_ids_in_range 
---------------------------------------

SELECT "merge_request_diff_details".*
FROM "merge_request_diff_details" 
WHERE "merge_request_diff_details"."merge_request_diff_id" BETWEEN 1 AND 1227

How to set up and validate locally

What do we want to test?

  1. Are verification queries correct for the cases where verification details are stored in a separate and when stored on the model's table?

1a. For separate table, extra filters should not be included, e.g. MergeRequestDiff

  • Make sure all replicable MergeRquestDiff records (e.g. stored_externally = true) have corresponding MergeRequestDiffDetail objects, else create them.
  • Check sql run for verification updates. See queries above to know what the SQL should look like.

1b. For same table, extra filters should be included, use Terraform::StateVersion

  • Create/confirm a few Terraform::StateVersion objects
  • Check sql run for verification updates. See queries above to know what the SQL should look like.
  1. Smoke testing for verification on primary and secondary

Screenshot from locally run primary Screenshot_2021-11-08_at_18.26.15

2a. Confirm External MR Diff verification is working (separate verification details table)

  • on primary
  • on secondary

2b. Confirm Terraform State Version verification is working (replicable model table used for verification details)

  • on primary
  • on secondary

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports