Improve performance of verification queries
What does this MR do and why?
Resolves: #338598 (closed)
Use of caffeine recommended for review.
This MR makes it so that the performance of verification queries is improved for the cases where the verification details of a replicable are stored in a separate table, e.g. for External Merge Request Diffs. This is done in the backend by splitting the common scope used in verification queries, available_verifiables
into two partially overlapping scopes, verifiables
and available_verifiables
.
available_verifiables
was the scope declared in VerifiableRegistry
and ReplicableModel
. It was used such that it included all available_replicables
with added filters where applicable, e.g. whether files were stored externally in the case of External Merge Request Diff.
Essentially, verifiables
are those records in the replicable's table that should be checksummed on primary or have been replicated on secondary. But available_verifiables
are those replicable records that contain verification details and this scope can be directly queried upon on the verification details table in e.g., the VerificationState
queries.
In replicables where verification state is stored in a separate table, available_verifiables
is overriden in the replicable model to join
on the separate table, e.g. MergeRequestDiffDetail
for the replicable model MergeRequestDiff
.
In others, the scope applies to the replicable model itself, e.g. GroupWikiRepository
and is equivalent to the scope verifiables
.
At the database level, the queries are altered only for datatypes like merge_request_diffs
where a corresponding table merge_Request_diff_details
is used. The queries are adapted such that, some filters are no longer needed. These filters (stored_externally
, e.g.) were used when creating records in the merge_request_diff_details
table, so querying with them is redundant. This MR removes these extra filters from the queries.
A note on the status of mergability of this MR
Update: the MR is now mergable.
I have removed the temporarily quarantined QA tests. There were still broken QA tests. Instead of testing out the deb package, I can verify this MR out on staging, once it is merged and deployed.
The MR is already verified on a local setup.
The last few commits are temporary. They quarantine some failing QA tests. This is done to get a deb package out of the package-and-qa
pipeline to test this MR on a GET setup.
This commit will eventually be removed.
I will then update the Draft status of the MR to make it mergeable.
For database review - modified queries
The queries were run on a local gdk setup because the merge_request_diff_details
table is not filled on production (Geo is not enabled) and on staging, it has very few records.
Please see comment for query plans.
For backend review - SQL for Verification queries: before and after
MergeRequestDiff
Case 1: Using separate verification table for Only start_verification_batch
query is changed.
Click to see queries
start_verification_batch
--------------------------
# BEFORE
UPDATE merge_request_diff_details
SET "verification_state" = 1,
"verification_started_at" = NOW()
WHERE merge_request_diff_id IN
(SELECT merge_request_diff_id
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
'empty'))
AND "merge_request_diffs"."stored_externally" = TRUE
AND "merge_request_diffs"."external_diff_store" = 1
AND "merge_request_diff_details"."verification_state" = 3
AND ("merge_request_diff_details"."verification_retry_at" IS NULL
OR "merge_request_diff_details"."verification_retry_at" < '2021-10-29 16:34:32.359269')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id
# AFTER
UPDATE merge_request_diff_details
SET "verification_state" = 1,
"verification_started_at" = NOW() WHERE merge_request_diff_id IN
(SELECT merge_request_diff_id
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE "merge_request_diff_details"."verification_state" = 0
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id
Verification pending
-------------------
# BEFORE
SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
'empty'))
AND "merge_request_diffs"."stored_externally" = TRUE
AND "merge_request_diffs"."external_diff_store" = 1
AND "merge_request_diff_details"."verification_state" = 0
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
# AFTER
SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id" WHERE "merge_request_diff_details"."verification_state" = 0
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
Verification failed
-------------------
# BEFORE
SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
'empty'))
AND "merge_request_diffs"."stored_externally" = TRUE
AND "merge_request_diffs"."external_diff_store" = 1
AND "merge_request_diff_details"."verification_state" = 3
AND ("merge_request_diff_details"."verification_retry_at" IS NULL
OR "merge_request_diff_details"."verification_retry_at" < '2021-10-29 16:34:32.359269')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
# AFTER
SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
INNER JOIN "merge_request_diff_details" ON "merge_request_diff_details"."merge_request_diff_id" = "merge_request_diffs"."id"
WHERE "merge_request_diff_details"."verification_state" = 3
AND ("merge_request_diff_details"."verification_retry_at" IS NULL
OR "merge_request_diff_details"."verification_retry_at" < '2021-10-29 16:19:16.185384')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
---------------------
**Registry Queries (all unchanged) **
---------------------
start_verification_batch
--------------
# BEFORE
UPDATE merge_request_diff_registry
SET "verification_state" = 1,
"verification_started_at" = NOW()
WHERE merge_request_diff_id IN
(SELECT "merge_request_diff_registry"."merge_request_diff_id"
FROM "merge_request_diff_registry"
WHERE ("merge_request_diff_registry"."state" IN (2))
AND ("merge_request_diff_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id
# AFTER
UPDATE merge_request_diff_registry
SET "verification_state" = 1,
"verification_started_at" = NOW()
WHERE merge_request_diff_id IN
(SELECT "merge_request_diff_registry"."merge_request_diff_id"
FROM "merge_request_diff_registry"
WHERE ("merge_request_diff_registry"."state" IN (2))
AND ("merge_request_diff_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
FOR UPDATE SKIP LOCKED) RETURNING merge_request_diff_id
Verification pending
-------------------
# BEFORE
SELECT "merge_request_diff_registry".*
FROM "merge_request_diff_registry"
WHERE ("merge_request_diff_registry"."state" IN (2))
AND ("merge_request_diff_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
# AFTER
SELECT "merge_request_diff_registry".*
FROM "merge_request_diff_registry"
WHERE ("merge_request_diff_registry"."state" IN (2))
AND ("merge_request_diff_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
Verification failed
-------------------
# BEFORE
SELECT "merge_request_diff_registry".*
FROM "merge_request_diff_registry" WHERE ("merge_request_diff_registry"."state" IN (2))
AND ("merge_request_diff_registry"."verification_state" IN (3))
AND ("merge_request_diff_registry"."verification_retry_at" IS NULL
OR "merge_request_diff_registry"."verification_retry_at" < '2021-10-29 16:34:33.000263')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
# AFTER
SELECT "merge_request_diff_registry".*
FROM "merge_request_diff_registry" WHERE ("merge_request_diff_registry"."state" IN (2))
AND ("merge_request_diff_registry"."verification_state" IN (3))
AND ("merge_request_diff_registry"."verification_retry_at" IS NULL
OR "merge_request_diff_registry"."verification_retry_at" < '2021-10-29 16:42:01.805050')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
Terraform::StateVersion
Case 2: Verification details in model table, All queries are unchanged.
Click to expand
start_verification_batch
--------------------------
# BEFORE
UPDATE terraform_state_versions
SET "verification_state" = 1,
"verification_started_at" = NOW()
WHERE id IN
(SELECT "terraform_state_versions"."id"
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
AND ("terraform_state_versions"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
FOR UPDATE SKIP LOCKED) RETURNING id
# AFTER
UPDATE terraform_state_versions
SET "verification_state" = 1,
"verification_started_at" = NOW() WHERE id IN
(SELECT "terraform_state_versions"."id"
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
AND ("terraform_state_versions"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
FOR UPDATE SKIP LOCKED) RETURNING id
Verification pending
-------------------
# BEFORE
SELECT "terraform_state_versions".*
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
AND ("terraform_state_versions"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
# AFTER
SELECT "terraform_state_versions".*
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
AND ("terraform_state_versions"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
Verification failed
-------------------
# BEFORE
SELECT "terraform_state_versions".*
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
AND ("terraform_state_versions"."verification_state" IN (3))
AND ("terraform_state_versions"."verification_retry_at" IS NULL
OR "terraform_state_versions"."verification_retry_at" < '2021-10-29 16:39:46.733845')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
# AFTER
SELECT "terraform_state_versions".*
FROM "terraform_state_versions"
WHERE "terraform_state_versions"."file_store" = 1
AND ("terraform_state_versions"."verification_state" IN (3))
AND ("terraform_state_versions"."verification_retry_at" IS NULL
OR "terraform_state_versions"."verification_retry_at" < '2021-10-29 16:23:32.369965')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
---------------------
** Registry Queries**
---------------------
start_verification_batch
-------------------------
# BEFORE
UPDATE terraform_state_version_registry
SET "verification_state" = 1,
"verification_started_at" = NOW()
WHERE terraform_state_version_id IN
(SELECT "terraform_state_version_registry"."terraform_state_version_id"
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
AND ("terraform_state_version_registry"."verification_state" IN (3))
AND ("terraform_state_version_registry"."verification_retry_at" IS NULL
OR "terraform_state_version_registry"."verification_retry_at" < '2021-10-29 16:31:56.060168')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
FOR UPDATE SKIP LOCKED) RETURNING terraform_state_version_id
# AFTER
UPDATE terraform_state_version_registry
SET "verification_state" = 1,
"verification_started_at" = NOW()
WHERE terraform_state_version_id IN
(SELECT "terraform_state_version_registry"."terraform_state_version_id"
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
AND ("terraform_state_version_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
FOR UPDATE SKIP LOCKED) RETURNING terraform_state_version_id
Verification pending
-------------------
# BEFORE
SELECT "terraform_state_version_registry".*
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
AND ("terraform_state_version_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10 start_verification_batch
# AFTER
SELECT "terraform_state_version_registry".*
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
AND ("terraform_state_version_registry"."verification_state" IN (0))
ORDER BY verified_at ASC NULLS FIRST LIMIT 10
Verification failed
-------------------
# BEFORE
SELECT "terraform_state_version_registry".*
FROM "terraform_state_version_registry"
WHERE ("terraform_state_version_registry"."state" IN (2))
AND ("terraform_state_version_registry"."verification_state" IN (3))
AND ("terraform_state_version_registry"."verification_retry_at" IS NULL
OR "terraform_state_version_registry"."verification_retry_at" < '2021-10-29 16:31:56.060168')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
# AFTER
SELECT "terraform_state_version_registry".*
FROM "terraform_state_version_registry" WHERE ("terraform_state_version_registry"."state" IN (2))
AND ("terraform_state_version_registry"."verification_state" IN (3))
AND ("terraform_state_version_registry"."verification_retry_at" IS NULL
OR "terraform_state_version_registry"."verification_retry_at" < '2021-10-29 16:45:45.624111')
ORDER BY verification_retry_at ASC NULLS FIRST LIMIT 9
Geo::VerificationStateBackfillService
from VerificationState
Queries run by Includes use of the new verifiables
scope.
Click to expand
pluck_verifiable_ids_in_range (using the scope `verifiables`)
------------------------------
SELECT "merge_request_diffs".*
FROM "merge_request_diffs"
WHERE ("merge_request_diffs"."state" NOT IN ('without_files',
'empty'))
AND "merge_request_diffs"."stored_externally" = TRUE
AND "merge_request_diffs"."external_diff_store" = 1
AND "merge_request_diffs"."id" BETWEEN 1 AND 1227
pluck_verification_details_ids_in_range
---------------------------------------
SELECT "merge_request_diff_details".*
FROM "merge_request_diff_details"
WHERE "merge_request_diff_details"."merge_request_diff_id" BETWEEN 1 AND 1227
How to set up and validate locally
What do we want to test?
- Are verification queries correct for the cases where verification details are stored in a separate and when stored on the model's table?
1a. For separate table, extra filters should not be included, e.g. MergeRequestDiff
-
Make sure all replicable MergeRquestDiff
records (e.g.stored_externally
= true) have correspondingMergeRequestDiffDetail
objects, else create them. -
Check sql run for verification updates. See queries above to know what the SQL should look like.
1b. For same table, extra filters should be included, use Terraform::StateVersion
-
Create/confirm a few Terraform::StateVersion
objects -
Check sql run for verification updates. See queries above to know what the SQL should look like.
- Smoke testing for verification on primary and secondary
Screenshot from locally run primary
2a. Confirm External MR Diff verification is working (separate verification details table)
-
on primary -
on secondary
2b. Confirm Terraform State Version verification is working (replicable model table used for verification details)
-
on primary -
on secondary
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.