Determine patterns to use for secret detection scanning

We need to determine well-defined patterns to be used in this check.

Decide if the checks used in the investigation spike are performant, or if we should start with a subset.

We may look to groupvulnerability research for guidance and/or assistance in making this determination.

Shall we use the gitleaks.toml file from secrets analyzer?

Probably yes, the secrets analyzer is likely to continue to be maintained in the long term, and we should also avoid maintaining two separate set of rules (one in the analyzer and one in the gem). Plus, using the same configuration means we achieve some level of parity between our CI-based product offering and this new feature.

Finally, our benchmarking spike based its results on matching blobs against the entire gitleaks.toml ruleset, therefore, I believe it only make sense to use the same configuration for our implementation. Please read more on this topic from the README of our Secret Detection Go POC (internal only).

Decision

The secrets we will initially be detecting can be found at https://gitlab.com/gitlab-org/gitlab/-/blob/9b2563cd093ce04844d4b820b6c2e9c2a4501b96/gems/gitlab-secret_detection/lib/gitleaks.toml. There are 35 rules.

Edited Jan 04, 2024 by rossfuhrman

Determine patterns to use for secret detection scanning

Shall we use the gitleaks.toml file from secrets analyzer?

Decision

Shall we use the `gitleaks.toml` file from `secrets` analyzer?