Ci::PipelineSuccessUnlockArtifactsWorker times out updating ci_job_artifacts table
From https://log.gprd.gitlab.net/goto/609ba7a0-55c7-11ed-8d37-e9a2f393ea2a, we see that Ci::PipelineSuccessUnlockArtifactsWorker
occasionally hits a statement timeout updating the ci_job_artifacts
table. This happened 82 times in the past week:
The SQL query:
UPDATE "ci_job_artifacts" SET "locked" = $1 WHERE "ci_job_artifacts"."job_id" IN (SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" = $2 AND "ci_builds"."commit_id" IN ($3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48, $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78, $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93, $94, $95, $96, $97, $98, $99, $100, $101, $102)) RETURNING ("ci_job_artifacts"."id")
As seen above, this query updates job artifact rows that belong to 102 pipelines (ci_build.commit_id). On large project with a lot of jobs and job artifacts such as gitlab-org/gitlab
, this could mean updating more than 10000 rows in one go.
This query runs in the following transaction.
::Ci::Pipeline.transaction do
unlocked_pipelines = unlock_pipelines(ci_ref, before_pipeline)
unlocked_job_artifacts = unlock_job_artifacts(unlocked_pipelines) # query that times out
results[:unlocked_pipeline_artifacts] += unlock_pipeline_artifacts(unlocked_pipelines)
end
As a result of the timeout, the transaction would be rolled back.
Proposal
Instead of unlocking all pipelines that need to be unlocked and all job artifacts and all pipeline artifacts belonging to those pipelines in a single transaction, we could distribute the workload to unlocking individual pipeline by putting them into a queue.
The UnlockArtifactsService where everything currently happens will change to only select pipeline IDs that need to be unlocked and place these pipeline IDs into a queue.
A worker UnlockPipelineAndArtifactsWorker
will pick up each pipeline ID individually and proceed to unlock the pipeline, its job artifacts and its pipeline artifacts.
This would result in smaller queries that operate on a single pipeline and a small subset of job artifacts and pipeline artifacts each time.
Proposed implementation
Refer to #406255 (closed)