Geo: Many Postgres locks on Tracking Database from VerificationTimeoutWorker
Problem
Example lock on the Geo Tracking DB found, out of very many similar locks:
961236,"{961576,958296}","postgres","REDACTED","REDACTED","active","2022-12-05 13:19:30.031488+00","08:19:45.693908","Lock","tuple","/*application:sidekiq,correlation_id:aa617b5ce294f7f4150fd052a5d8aeac,jid:be9a41a4fb2f5208e2a89f58,endpoint_id:Geo::VerificationTimeoutWorker,db_config_name:geo*/ UPDATE ""job_artifact_registry"" SET ""verification_state"" = 3, ""verification_failure"" = 'Verification timed out after 28800', ""verification_checksum"" = NULL, ""verification_retry_count"" = 1, ""verification_retry_at"" = '2022-12-05 13:03:43.393410', ""verified_at"" = '2022-12-05 13:02:55.393473' WHERE (""job_artifact_registry"".""state"" IN (2)) AND ""job_artifact_registry"".""verification_state"" = 1 AND ""job_artifact_registry"".""verification_started_at"" < '2022-12-05 05:02:55.396449' AND ""job_artifact_registry"".""id"" >= 71522214"
All of the locks found were for queries by VerificationTimeoutWorker operating on job_artifact_registry.
We don't need/want more than one of these jobs to run at a time per table. This job gets enqueued every minute. I see the worker is marked idempotent! which is supposed to use deduplication strategy :until_executing by default. I think what's happening is each time a Sidekiq worker picks up one of these jobs, then another one gets enqueued, and then runs into this lock.
Possible solution
Make VerificationTimeoutWorker use the job deduplication strategy :until_executed so no more jobs get enqueued while one is executing, like:
deduplicate :until_executed
idempotent!
Possible workaround
Cancel the VerificationTimeoutWorker locks https://stackoverflow.com/questions/1063043/how-to-release-possible-postgres-row-locks
It sounds like that should cancel the query as well, which I assume should raise an error in the Sidekiq job if it is still running. We want the job to stop running since more will be enqueued anyway.