Geo: Fix concurrent VerificationBatchWorker jobs
Problem
We're getting errors while checksumming package files on staging.gitlab.com primary.
I briefly enabled geo_package_file_verification
with /chatops run feature set geo_package_file_verification true --staging
today. There are no package files locally stored in staging.gitlab.com, and sync_object_storage
is disabled on the GeoNode record of geo.staging.gitlab.com, so there is nothing to verify. But the primary site raised DB errors in Geo::VerificationBatchWorker
:
https://sentry.gitlab.net/gitlab/staginggitlabcom/issues/2465297/?query=is%3Aunresolved
ActiveRecord::Deadlocked
PG::TRDeadlockDetected: ERROR: deadlock detected
DETAIL: Process 1306 waits for ShareLock on transaction 2661053543; blocked by process 13203.
Process 13203 waits for ShareLock on transaction 2661053539; blocked by process 1306.
HINT: See server log for query details.
CONTEXT: while rechecking updated tuple (28,46) in relation "packages_package_files
https://sentry.gitlab.net/gitlab/staginggitlabcom/issues/2465298/?query=is%3Aunresolved
ActiveRecord::ValueTooLong
PG::StringDataRightTruncation: ERROR: value too long for type character varying(255)
https://sentry.gitlab.net/gitlab/staginggitlabcom/issues/2465299/?query=is%3Aunresolved
ActiveRecord::QueryCanceled
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
CONTEXT: while updating tuple (445,5) in relation "packages_package_files"
https://sentry.gitlab.net/gitlab/staginggitlabcom/issues/2465308/?query=is%3Aunresolved
ActiveRecord::QueryCanceled
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
CONTEXT: while rechecking updated tuple (134,32) in relation "packages_package_files"
Proposal
From @mbobin's excellent comment below:
Use .lock('FOR UPDATE SKIP LOCKED')
at https://gitlab.com/gitlab-org/gitlab/-/blob/v13.8.2-ee/ee/lib/gitlab/geo/verification_state.rb#L158.
Existing example: https://gitlab.com/gitlab-org/gitlab/-/blob/v13.8.2-ee/app/models/ci/deleted_object.rb#L17 which is called from https://gitlab.com/gitlab-org/gitlab/-/blob/v13.8.2-ee/app/services/ci/delete_objects_service.rb#L41.