Migrate Secure Job Artifacts to Security Scans
What does this MR do?
Triggers a Background Migration that converts all Ci::JobArtifact
records with a file type corresponding to Security Reports sast
/dependency scanning
/container scanning
/dast
to a Security::Scan
record.
New jobs run on CI that produce a Security Report will have a Security::Scan
record created as part of a separate worker. The migration in this MR is idempotent, such that if this case is encountered or the background migration is run again a duplicate record will not be created.
This MR in part resolves issue #10250 (closed).
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Database performance
The following SQL queries will run:
Temporary index
CREATE INDEX CONCURRENTLY job_artifacts_id_type_job_timestamps_temp_index ON public.ci_job_artifacts
USING btree (id, file_type, job_id, created_at, updated_at)
WHERE file_type BETWEEN 5 and 8
The above index will take around 43
minutes to create, and uses 49MB
of space. The index will be removed after the migration is complete, in a different MR.
Scheduling migration
- There are
337_500_000
job artifacts on CI - Of these,
1_059_310
are security reports: file_type 5, 6, 7 or 8
In batches of 10_000
, the scheduling migration filters job artifacts by security reports and schedules BackgroundWorkers. The scheduling migration will therefore run queries like the following, 1_059_310 / 10_000 = 106
times:
SELECT "ci_job_artifacts"."id"
FROM "ci_job_artifacts"
WHERE "ci_job_artifacts"."file_type" IN (5, 6, 7, 8)
AND "ci_job_artifacts"."id" >= 1500000
ORDER BY "ci_job_artifacts"."id" ASC
LIMIT 1
OFFSET 10000
https://explain.depesz.com/s/D4R4
SELECT MIN(id), MAX(id)
FROM "ci_job_artifacts"
WHERE "ci_job_artifacts"."file_type" IN (5, 6, 7, 8)
AND "ci_job_artifacts"."id" >= 1500000
AND "ci_job_artifacts"."id" < 1510000
https://explain.depesz.com/s/9Pa7
Background migration
- This migration is run a maximum of
106
times. - There are up to
10_000
records being inserted in each run. - The actual number of records per executed run is expected to be significantly lower.
- We don't expect linear distribution of security scans in each migration.
- Each migration explain plan details that the migration will take less than 1ms. This is hard to verify, as this will surely be affected by the the number of records to insert.
- With a
2min
delay between migration jobs, it is estimated it will take212
minutes to complete all of the migration jobs.
Each scheduled migration runs SQL such as the following:
INSERT INTO security_scans (created_at, updated_at, build_id, scan_type)
SELECT ci_job_artifacts.created_at, ci_job_artifacts.updated_at, ci_job_artifacts.job_id, ci_job_artifacts.file_type - 4
FROM ci_job_artifacts
WHERE ci_job_artifacts.id BETWEEN 1500000 AND 1510000
AND ci_job_artifacts.file_type BETWEEN 5 and 8
ON CONFLICT (build_id, scan_type) DO NOTHING;