Skip to content

Create worker to store security reports by project

What does this MR do and why?

Technical context

UPSERT queries require acquiring locks on unique index tuples. This will cause lock contention if multiple processes try to UPSERT records with the same unique attributes. The lock contention will make each process wait for the other to complete.

Historical context

The StoreSecurityReportsWorker job has the lock-contention issue described above. It was discovered in this production incident.

Further details from the incident root analysis

We run StoreSecurityReportsWorker for each pipeline for the default branch if it has security reports. In that worker, within a transaction, we create records for different tables. One of those tables is vulnerability_identifiers, and the query to create the records for that table is an UPSERT query.

There were many pipelines for a single project, which caused running many StoreSecurityReportsWorker jobs in parallel. Each of which tried to UPSERT the same records, causing lock contention and long transaction times.1

We have implemented a temporary solution in !147816 (merged)

This MR

The short-term solution resolves the lock contention by, in effect, making the jobs run sequentially. However, going through these jobs sequentially can take a long time (somewhere on the order of 3.25 hours to 20 hours2)

In this change, we implement a long-term solution that replaces the problematic job with a similar job that can make use of our existing sidekiq deduplication tooling

With this change, even if a single project suddenly has many pipelines created for it, only one job will be scheduled and the rest will be de-duplicated

Follow-up work

SQL

Queries from the StoreSecurityReportsByProjectWorker

from here
SELECT
    "ci_pipeline_metadata".*
FROM
    "ci_pipeline_metadata"
WHERE
    "ci_pipeline_metadata"."project_id" = 278964
    AND "ci_pipeline_metadata"."security_scans_created_at" IS NOT NULL
ORDER BY
    "ci_pipeline_metadata"."security_scans_created_at" DESC
LIMIT 1
from here
SELECT
    "ci_pipelines"."ref",
    "ci_pipelines"."sha",
    "ci_pipelines"."before_sha",
    "ci_pipelines"."created_at",
    "ci_pipelines"."updated_at",
    "ci_pipelines"."tag",
    "ci_pipelines"."yaml_errors",
    "ci_pipelines"."committed_at",
    "ci_pipelines"."project_id",
    "ci_pipelines"."status",
    "ci_pipelines"."started_at",
    "ci_pipelines"."finished_at",
    "ci_pipelines"."duration",
    "ci_pipelines"."user_id",
    "ci_pipelines"."lock_version",
    "ci_pipelines"."pipeline_schedule_id",
    "ci_pipelines"."source",
    "ci_pipelines"."config_source",
    "ci_pipelines"."protected",
    "ci_pipelines"."failure_reason",
    "ci_pipelines"."iid",
    "ci_pipelines"."merge_request_id",
    "ci_pipelines"."source_sha",
    "ci_pipelines"."target_sha",
    "ci_pipelines"."external_pull_request_id",
    "ci_pipelines"."ci_ref_id",
    "ci_pipelines"."locked",
    "ci_pipelines"."partition_id",
    "ci_pipelines"."id",
    "ci_pipelines"."auto_canceled_by_id"
FROM
    "ci_pipelines"
WHERE
    "ci_pipelines"."id" = 21
LIMIT 1

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #452005

  1. gitlab-com/gl-infra/production#17754 (comment 1830395525)

  2. gitlab-com/gl-infra/production#17754 (comment 1831442417)

Edited by Michael Becker

Merge request reports