Create worker to store security reports by project
What does this MR do and why?
Technical context
UPSERT
queries require acquiring locks on unique index tuples. This will cause lock contention if multiple processes try to UPSERT
records with the same unique attributes. The lock contention will make each process wait for the other to complete.
Historical context
The StoreSecurityReportsWorker
job has the lock-contention issue described above. It was discovered in this production incident.
Further details from the incident root analysis
We run StoreSecurityReportsWorker
for each pipeline for the default branch if it has security reports. In that worker, within a transaction, we create records for different tables. One of those tables is vulnerability_identifiers
, and the query to create the records for that table is an UPSERT
query.
There were many pipelines for a single project, which caused running many StoreSecurityReportsWorker
jobs in parallel. Each of which tried to UPSERT
the same records, causing lock contention and long transaction times.1
We have implemented a temporary solution in !147816 (merged)
This MR
The short-term solution resolves the lock contention by, in effect, making the jobs run sequentially. However, going through these jobs sequentially can take a long time (somewhere on the order of 3.25 hours to 20 hours2)
In this change, we implement a long-term solution that replaces the problematic job with a similar job that can make use of our existing sidekiq deduplication tooling
With this change, even if a single project suddenly has many pipelines created for it, only one job will be scheduled and the rest will be de-duplicated
Follow-up work
-
[Feature flag] Rollout of `deduplicate_security... (#460476) • Michael Becker • 17.1 -
remove the feature flag -
delete original job / rename new job from StoreSecurityReportsByProjectWorker
toStoreSecurityReportsWorker
-
SQL
StoreSecurityReportsByProjectWorker
Queries from the - explain without index: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/28290/commands/88299
- explain with index: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/28290/commands/88303
from here
SELECT
"ci_pipeline_metadata".*
FROM
"ci_pipeline_metadata"
WHERE
"ci_pipeline_metadata"."project_id" = 278964
AND "ci_pipeline_metadata"."security_scans_created_at" IS NOT NULL
ORDER BY
"ci_pipeline_metadata"."security_scans_created_at" DESC
LIMIT 1
from here
SELECT
"ci_pipelines"."ref",
"ci_pipelines"."sha",
"ci_pipelines"."before_sha",
"ci_pipelines"."created_at",
"ci_pipelines"."updated_at",
"ci_pipelines"."tag",
"ci_pipelines"."yaml_errors",
"ci_pipelines"."committed_at",
"ci_pipelines"."project_id",
"ci_pipelines"."status",
"ci_pipelines"."started_at",
"ci_pipelines"."finished_at",
"ci_pipelines"."duration",
"ci_pipelines"."user_id",
"ci_pipelines"."lock_version",
"ci_pipelines"."pipeline_schedule_id",
"ci_pipelines"."source",
"ci_pipelines"."config_source",
"ci_pipelines"."protected",
"ci_pipelines"."failure_reason",
"ci_pipelines"."iid",
"ci_pipelines"."merge_request_id",
"ci_pipelines"."source_sha",
"ci_pipelines"."target_sha",
"ci_pipelines"."external_pull_request_id",
"ci_pipelines"."ci_ref_id",
"ci_pipelines"."locked",
"ci_pipelines"."partition_id",
"ci_pipelines"."id",
"ci_pipelines"."auto_canceled_by_id"
FROM
"ci_pipelines"
WHERE
"ci_pipelines"."id" = 21
LIMIT 1
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #452005