Secret Detection: scan results are incorrectly identifying brackets as secret

Summary

There have been a few reports of our Secret Detection scan reporting brackets as a secret. Reddit post briefly describing this behavior.

Our initial hunch is that Secret Detection is working correctly and has in fact detected a secret, however the blob URL being rendered in the UI is pointing to the incorrect revision of the file. This is based on the following facts:

The majority of SD patterns include a common prefix, so the chance of false positive detections is low. It certainly cannot report brackets as secrets as even some of the more relaxed patterns have minimum length constraints.
SD can essentially run in two modes. When mode 2 is used, extra commit information is attached to each vulnerability finding (see the References section below for an example).
1. A "no git" scan which analyses the current state of a repository. This is the default behaviour for scans on the main/default branch.
2. A "commit range" scan which analyses the state of a repository at each commit SHA. Only the diffs between commits are analysed. This is the behaviour when SD is executed in an MR pipeline, when SECRET_DETECTION_HISTORIC_SCAN is enabled, or when a custom commit range is provided through the SECRET_DETECTION_LOG_OPTIONS var.
Because SD running a "commit range" scan literally traverses the Git history, it is not possible for SD to attach a commit SHA that doesn't exist in its current clone of the repository.
SD in its default configuration (i.e. when you include this snippet in your CI config) will perform scans every time you commit, in an MR pipeline, and again on the default branch when an MR is merged. Each scan is mutually exclusive. Data is not ingested into the Vulnerability report until a scan on the default branch succeeds. And because of the fact above, SD would be traversing the commits on the default branch and attaching commit SHAs that exist on that branch.

Reach of this bug

Support was recently added to properly execute SD within MR pipelines. Previously we would only reliably scan the most recent commit in an MR branch. After this change, all commits in an MR branch started to be scanned. This likely has led to more findings that do include commit information, but again due to point 3. above, it should not impact the validity of the JSON scan report.

References

Example finding from `gl-secret-detection-report.json` when a "commit range" scan is run

{
    "description": "PGP private key secret has been found in commit ceb666b7.",
    "name": "PGP private key",
    "raw_source_code_extract": "-----BEGIN PGP PRIVATE KEY BLOCK-----",
    "scanner": {
        "id": "gitleaks",
        "name": "Gitleaks"
    },
    "message": "PGP private key detected; please remove and revoke it if this is a leak.",
    "category": "secret_detection",
    "severity": "Critical",
    "id": "1f0a4881fe77508aef863706b8544af70c71d1eef0ec771336d6b6b469863124",
    "cve": "badfile-1:8a1ddaa7dab83b9a5ffe092bc3aad1df30c2e0be6246aee0a0da616d628d1da7:PGP private key",
    "confidence": "Unknown",
    "location": {
        "file": "badfile-1",
        "commit": {
            "author": "James Liu",
            "sha": "ceb666b75cf3d4e7c71f5328f02c791849bde6ad",
            "date": "2023-03-21T03:13:47Z",
            "message": "Merge branch 'main-patch-7ef3' into 'main'\n\nAdd secrets in multiple commits\n\nSee merge request jamesliu-gitlab/testing-398036!2\n\ncommit 62af10e245cd59681a73eb6c10e5cebd5481aac4\nAuthor: James Liu <jliu@gitlab.com>\nDate:   Tue Mar 21 03:13:47 2023 +0000\n\nAdd secrets in multiple commits"
        },
        "start_line": 1
    },
    "identifiers": [
        {
            "name": "Gitleaks rule ID PGP private key",
            "type": "gitleaks_rule_id",
            "value": "PGP private key"
        }
    ]
}

Steps to reproduce

See this comment below

Example Project

I created a test project to play around with various configurations of SD to see if I can reproduce the problem.

Secret Detection: scan results are incorrectly identifying brackets as secret

Summary

Reach of this bug

References

Steps to reproduce

Example Project

What is the current bug behavior?

What is the expected correct behavior?

Relevant logs and/or screenshots

Possible fixes