Skip to content

Post-process leaked credentials on all branches

What does this MR do and why?

This merge request:

Please keep in mind the following points while reviewing:

  • ScanSecurityReportSecretsWorker is moved out of StoreSecurityReportsWorker to make sure they run separately:
    • I opted for this to:
    • I decided to not have them both run for all pipelines, as the explanation on how vulnerabilities are created from security reports implied that ingestion report had to happen on the default branch, and I didn't want to mess up with that process as I'm still figuring my way around this part of the codebase.
  • Additionally, vulnerability_findings is completely replaced with security_findings in ScanSecurityReportSecretsWorker because the latter exist for all pipelines, and not just the default branch as is the case with the former. I'm not entirely sure if this will have side-effects later on though, but I tried to minimize them by using each associated vulnerability_finding of a security_finding while collecting revocable_keys.
  • I had to remove a lot of spec examples in which private methods were being tested, as this is often considered a bad practice. Instead, I tried to write spec examples in such a way that tests the behaviour of the service classes or workers without calling private methods directly.
  • I'm not sure what happens for security_findings that didn't have raw_source_code_extract persisted before, do we have to backfill them? 🤔 cc/ @theoretick wdyt?

Resolves #299212 (closed).

Database query plans + SQL

This merge request updates ScanSecurityReportSecretsWorker to:

  1. Use each_batch to iterate over security_findings in batches (see this line).

And Security::StoreScansService to:

  1. To use security_scans instead of security_findings to deteremine if token revocation should take place (see this line).

In doing so, the following SQL queries are introduced/updated, respectively:

Query 1

Raw SQL

-- initial query to get the first id
SELECT
    "security_findings"."id"
FROM
    "security_findings"
    INNER JOIN "security_scans" ON "security_findings"."scan_id" = "security_scans"."id"
    INNER JOIN "security_scans" "scans_security_findings" ON "scans_security_findings"."id" = "security_findings"."scan_id"
WHERE
    "security_scans"."pipeline_id" = 810828014
    AND "security_scans"."scan_type" = 5
ORDER BY
    "security_findings"."id" ASC
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16909/commands/57439

--- second query to get maximum findings by offset
SELECT
    "security_findings"."id"
FROM
    "security_findings"
    INNER JOIN "security_scans" ON "security_findings"."scan_id" = "security_scans"."id"
    INNER JOIN "security_scans" "scans_security_findings" ON "scans_security_findings"."id" = "security_findings"."scan_id"
WHERE
    "security_scans"."pipeline_id" = 810828014
    AND "security_scans"."scan_type" = 5
    AND "security_findings"."id" >= 1
ORDER BY
    "security_findings"."id" ASC
LIMIT 1 OFFSET 100

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16909/commands/57440

-- actual query
SELECT
    "security_findings".*
FROM
    "security_findings"
    INNER JOIN "security_scans" ON "security_findings"."scan_id" = "security_scans"."id"
    INNER JOIN "security_scans" "scans_security_findings" ON "scans_security_findings"."id" = "security_findings"."scan_id"
WHERE
    "security_scans"."pipeline_id" = 810828014
    AND "security_scans"."scan_type" = 5
    AND "security_findings"."id" >= 1

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16909/commands/57441

Note: a random master pipeline for which the secret_detection job had 170 leaks reported was used.

Query 2

Raw SQL

SELECT
    1 AS one
FROM
    "security_scans"
WHERE
    "security_scans"."pipeline_id" = 813814339
    AND "security_scans"."scan_type" = 5
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/16880/commands/57331

Please also see:

  • These two threads on why each_batch is introduced: 1, 2.
  • This thread on preloading vulnerability and vulnerability_findings.

Screenshots or screen recordings

Please see the screen recording below to see this in action.

post-process-leaked-credentials-on-all-pipelines-comp

How to set up and validate locally

To validate locally, please follow the steps below.

  • In your local GDK setup, create a new project if you don't have one already.
  • Add a .gitlab-ci.yml file to the project, which includes the secret-detection template, like the example below.
image: ruby:latest

include:
  - template: Jobs/Secret-Detection.gitlab-ci.yml
  • From Settings > Access Tokens, create a new project access token.
  • Create a new file with .rb extension in the repository. Let's call it gitlab_token.rb, for example.
  • Paste the project access token you created inside the file, as shown below.
  gitlab_token = "PASTE_PROJECT_ACCESS_TOKEN_HERE"
  • Save the file, and create a new branch, then commit the file to the branch.
  • Push the branch to your repository, and wait until the pipeline finishes running.
  • When the secret_detection job is finished, the access token should be revoked already.
  • Visit Settings > Access Tokens again to verify the token is no longer displayed under active tokens. 🎉

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Ahmed Hemdan

Merge request reports