[Spike] Exclusions for Pipeline SD

Overview

The Secret Detection Exclusions feature was introduced as part of the Secret Push Protection GA release. Exclusions are stored in the database and retrieved/applied during scanning for secrets. Those exclusions handle three types of secrets:

Secrets matching a rule from the default ruleset, e.g. gitlab_pipeline_trigger_token.
Secrets found in file matching a path that is either specific (e.g. spec/app/project_spec.rb) or a simple glob (e.g. spec/**/*.rb).
Secrets matching a specific raw value, e.g. dummyfaketoken-1234567890.

This spike aims to explore the technical feasibility and the direction for having those exclusions applied in Pipeline SD which runs isolated in a container as part of CI job and doesn't have access to GitLab's monolith database.

Proposal

See also the discussion below for more information.

We came up with two approaches to each aspect of applying exclusions to Pipeline SD, and want to confirm the viability of at least one of them:

1️⃣ Exclusions Retrieval and Injection

~~Injecting exclusions via CI Job Artifacts~~

In this approach, before a secret-detection job runs, we load a project SD exclusions from database and write them down into a job artifact that can be read by the secret_detection job. If the artifact is found, secrets analyzer reads the file and applies the exclusions.

This is ruled out in favour of the two other approaches described below.

Injecting exclusions via a file stored in the working directory

In this approach, a before_script is added to the Secret Detection CI template and is used to:

Call an API endpoint (e.g. GraphQL Project.securityExclusions query) to retrieve exclusions.
Save the list of exclusions to a file stored in the working directory of the container running secret_detection job.

Then the analyzer could read the exclusions from that file during scanning and apply those exclusions.

Injecting exclusions via ENVs

Another idea is to similarly inject the exclusions (retrieved through an API endpoint as discussed above) but instead of saving them to a file, we pass the list of exclusions as environment variable to the analyzer on initialization.

This isn't an ideal solution though, because ENVs is too simplistic for our needs, but still worth exploring.

2️⃣ Exclusions Processing

Processing exclusions via Analyzer Engine (i.e. common module)

One way to process exclusions is to add the logic handling that to the common module. In that scenario, we read the exclusions from either the file stored in the working directory, or from environment variables, and apply the exclusions before the report is generated similar to how we filter out vulnerabilities/secrets from disabled rules at the moment in the report module.

Processing exclusions via CI Components

To process exclusions, another idea is to introduce a new CI component that runs through the report generated by the secret-detection CI template or component and excludes findings based on them matching any type of secrets that were excluded for a project.

Progress

Below is a summary of the approaches discussed in the proposal section above:

Exclusion Injection	Exclusion Processing	POC/Demo
Stored in File	Applied in `secrets` analyzer (before report is generated)	See #503184 (comment 2321427953). ✅
Stored in File	Applied in a new CI Component (after report is generated)	See #503184 (comment 2332994757). ✅
Passed in ENVs	Applied in `secrets` analyzer (before report is generated)	See #503184 (comment 2321427953). ✅
Passed in ENVs	Applied in a new CI Component (after report is generated)	See #503184 (comment 2333109292). ✅

Note: For all approaches, exclusions are retrieved from the database via the GraphQL API

Edited Feb 05, 2025 by Ahmed Hemdan