Skip to content

Migrate Secure Job Artifacts to Security Scans

Cameron Swords requested to merge migrate-security-scans into master

What does this MR do?

Triggers a Background Migration that converts all Ci::JobArtifact records with a file type corresponding to Security Reports sast/dependency scanning/container scanning/dast to a Security::Scan record.

New jobs run on CI that produce a Security Report will have a Security::Scan record created as part of a separate worker. The migration in this MR is idempotent, such that if this case is encountered or the background migration is run again a duplicate record will not be created.

This MR in part resolves issue #10250 (closed).

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Database performance

The following SQL queries will run:

Temporary index

CREATE INDEX CONCURRENTLY job_artifacts_id_type_job_timestamps_temp_index ON public.ci_job_artifacts 
USING btree (id, file_type, job_id, created_at, updated_at) 
WHERE file_type BETWEEN 5 and 8

The above index will take around 43 minutes to create, and uses 49MB of space. The index will be removed after the migration is complete, in a different MR.

Scheduling migration

  • There are 337_500_000 job artifacts on CI
  • Of these, 1_059_310 are security reports: file_type 5, 6, 7 or 8

In batches of 10_000, the scheduling migration filters job artifacts by security reports and schedules BackgroundWorkers. The scheduling migration will therefore run queries like the following, 1_059_310 / 10_000 = 106 times:

SELECT "ci_job_artifacts"."id" 
FROM "ci_job_artifacts" 
WHERE "ci_job_artifacts"."file_type" IN (5, 6, 7, 8) 
AND "ci_job_artifacts"."id" >= 1500000 
ORDER BY "ci_job_artifacts"."id" ASC 
LIMIT 1
OFFSET 10000

https://explain.depesz.com/s/D4R4

SELECT MIN(id), MAX(id) 
FROM "ci_job_artifacts" 
WHERE "ci_job_artifacts"."file_type" IN (5, 6, 7, 8) 
AND "ci_job_artifacts"."id" >= 1500000 
AND "ci_job_artifacts"."id" < 1510000

https://explain.depesz.com/s/9Pa7

Background migration

  • This migration is run a maximum of 106 times.
  • There are up to 10_000 records being inserted in each run.
  • The actual number of records per executed run is expected to be significantly lower.
  • We don't expect linear distribution of security scans in each migration.
  • Each migration explain plan details that the migration will take less than 1ms. This is hard to verify, as this will surely be affected by the the number of records to insert.
  • With a 2min delay between migration jobs, it is estimated it will take 212 minutes to complete all of the migration jobs.

Each scheduled migration runs SQL such as the following:

INSERT INTO security_scans (created_at, updated_at, build_id, scan_type)
SELECT ci_job_artifacts.created_at, ci_job_artifacts.updated_at, ci_job_artifacts.job_id, ci_job_artifacts.file_type - 4
FROM ci_job_artifacts
WHERE ci_job_artifacts.id BETWEEN 1500000 AND 1510000
AND ci_job_artifacts.file_type BETWEEN 5 and 8
ON CONFLICT (build_id, scan_type) DO NOTHING;

https://explain.depesz.com/s/EC0F

Edited by Cameron Swords

Merge request reports