Post-process leaked credentials on all branches
What does this MR do and why?
This merge request:
- Updates
security_finding_data
JSON schema to includeraw_source_code_extract
. - Updates
Security:: StoreFindingsService
to persistraw_source_code_extract
. - Updates
Security::StoreScansService
to:- Schedule
ScanSecurityReportSecretsWorker
separately fromStoreSecurityReportsWorker
. - Use
security_scans
instead ofsecurity_findings
to determine if token revocation should take place.
- Schedule
- Updates
ScanSecurityReportSecretsWorker
to depend onsecurity_findings
instead ofvulnerability_findings
. - Updates Security Report Ingestion Overview document to reflect the updated process.
- And finally, updates all related specs to reflect the changes above.
Please keep in mind the following points while reviewing:
-
ScanSecurityReportSecretsWorker
is moved out ofStoreSecurityReportsWorker
to make sure they run separately:- I opted for this to:
- Ensure report ingestion will keep running solely for the default branch.
- While token revocation will run for all pipelines.
- I decided to not have them both run for all pipelines, as the explanation on how vulnerabilities are created from security reports implied that ingestion report had to happen on the default branch, and I didn't want to mess up with that process as I'm still figuring my way around this part of the codebase.
- I opted for this to:
- Additionally,
vulnerability_findings
is completely replaced withsecurity_findings
inScanSecurityReportSecretsWorker
because the latter exist for all pipelines, and not just the default branch as is the case with the former.I'm not entirely sure if this will have side-effects later on though, but I tried to minimize them by using each associatedvulnerability_finding
of asecurity_finding
while collectingrevocable_keys
. - I had to remove a lot of spec examples in which private methods were being tested, as this is often considered a bad practice. Instead, I tried to write spec examples in such a way that tests the behaviour of the service classes or workers without calling private methods directly.
I'm not sure what happens forsecurity_findings
that didn't haveraw_source_code_extract
persisted before, do we have to backfill them?🤔 cc/ @theoretick wdyt?
Resolves #299212 (closed).
Database query plans + SQL
This merge request updates ScanSecurityReportSecretsWorker
to:
- Use
each_batch
to iterate oversecurity_findings
in batches (see this line).
And Security::StoreScansService
to:
- To use
security_scans
instead ofsecurity_findings
to deteremine if token revocation should take place (see this line).
In doing so, the following SQL queries are introduced/updated, respectively:
Query 1
Raw SQL
-- initial query to get the first id
SELECT
"security_findings"."id"
FROM
"security_findings"
INNER JOIN "security_scans" ON "security_findings"."scan_id" = "security_scans"."id"
INNER JOIN "security_scans" "scans_security_findings" ON "scans_security_findings"."id" = "security_findings"."scan_id"
WHERE
"security_scans"."pipeline_id" = 810828014
AND "security_scans"."scan_type" = 5
ORDER BY
"security_findings"."id" ASC
LIMIT 1
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16909/commands/57439
--- second query to get maximum findings by offset
SELECT
"security_findings"."id"
FROM
"security_findings"
INNER JOIN "security_scans" ON "security_findings"."scan_id" = "security_scans"."id"
INNER JOIN "security_scans" "scans_security_findings" ON "scans_security_findings"."id" = "security_findings"."scan_id"
WHERE
"security_scans"."pipeline_id" = 810828014
AND "security_scans"."scan_type" = 5
AND "security_findings"."id" >= 1
ORDER BY
"security_findings"."id" ASC
LIMIT 1 OFFSET 100
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16909/commands/57440
-- actual query
SELECT
"security_findings".*
FROM
"security_findings"
INNER JOIN "security_scans" ON "security_findings"."scan_id" = "security_scans"."id"
INNER JOIN "security_scans" "scans_security_findings" ON "scans_security_findings"."id" = "security_findings"."scan_id"
WHERE
"security_scans"."pipeline_id" = 810828014
AND "security_scans"."scan_type" = 5
AND "security_findings"."id" >= 1
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16909/commands/57441
Note: a random master pipeline for which the secret_detection
job had 170 leaks reported was used.
Query 2
Raw SQL
SELECT
1 AS one
FROM
"security_scans"
WHERE
"security_scans"."pipeline_id" = 813814339
AND "security_scans"."scan_type" = 5
LIMIT 1
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16880/commands/57331
Please also see:
- These two threads on why
each_batch
is introduced: 1, 2. - This thread on preloading
vulnerability
andvulnerability_findings
.
Screenshots or screen recordings
Please see the screen recording below to see this in action.
post-process-leaked-credentials-on-all-pipelines-comp
How to set up and validate locally
To validate locally, please follow the steps below.
- In your local GDK setup, create a new project if you don't have one already.
- Add a
.gitlab-ci.yml
file to the project, which includes thesecret-detection
template, like the example below.
image: ruby:latest
include:
- template: Jobs/Secret-Detection.gitlab-ci.yml
- From Settings > Access Tokens, create a new project access token.
- Create a new file with
.rb
extension in the repository. Let's call itgitlab_token.rb
, for example. - Paste the project access token you created inside the file, as shown below.
gitlab_token = "PASTE_PROJECT_ACCESS_TOKEN_HERE"
- Save the file, and create a new branch, then commit the file to the branch.
- Push the branch to your repository, and wait until the pipeline finishes running.
- When the
secret_detection
job is finished, the access token should be revoked already. - Visit Settings > Access Tokens again to verify the token is no longer displayed under active tokens.
🎉
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.