Customer problem: Track secret detection findings better as they move

Note: This issue contains discussions of many possible solutions. We will explore individual solutions in related issues.

Problem

Secret Detection findings are tracked based only on the location of the secret found. (See #387583 (comment 1342668506) for additional details.)

Changing a file with a detected secret will create a new finding, which brings triage fatigue. Even though a finding has been triaged already, the one created for the new location is Detected (new) without any link to the already created ones.

In other words, secrets are popping again as soon as their location is updated, which can occur very often in large repos like http://gitlab.com/gitlab-org/gitlab.

Example:

GitLab PAT is leaked in README.md on line 3. Secret Detection creates a finding.
A new line is added above this one, moving the same PAT to line 5. Secret Detection creates a separate new finding, and marks the old one "No longer detected".

Proposal

Secrets are conveniently isolated from their context: Their location doesn't matter, only their value is important to determine whether the leak is really a leak. Therefore, the location should be removed from the fingerprinting.

This could mean creating the identifier/fingerprint for the leak based exclusively on the secret value.

Details and possible challenges

The soon-to-be-removed (and confusingly named) cve field gets closer to these semantics, including location, hashed value, and rule ID. See discussion and references in #387583 (comment 1339703977).

This would mean that rules would all need to be sure to extract the complete value, not a static portion of a secret like -----BEGIN PRIVATE KEY----- or `"service_account:"-----BEGIN PRIVATE KEY-----".

And we would need to think through the rollout of this change. It would be bad if the change immediately caused every old finding to be re-created all at once.

Alternative proposal

If the location accepts an array (Update Security Report format to make the locat... (#37405)), we could link all the locations where a specific secret has been detected. Not only would this make the triage more efficient, but it would also make the findings more meaningful. A new location would not change the status of a Secret Detection finding, but just add it its list of locations. This list should be reset between each scan.

Edited Dec 05, 2023 by Connor Gilbert