Determine patterns to use for secret detection scanning
We need to determine well-defined patterns to be used in this check.
Decide if the checks used in the investigation spike are performant, or if we should start with a subset.
We may look to groupvulnerability research for guidance and/or assistance in making this determination.
Shall we use the
gitleaks.toml
file fromsecrets
analyzer?
Probably yes, the
secrets
analyzer is likely to continue to be maintained in the long term, and we should also avoid maintaining two separate set of rules (one in the analyzer and one in the gem). Plus, using the same configuration means we achieve some level of parity between our CI-based product offering and this new feature.
Finally, our benchmarking spike based its results on matching blobs against the entire
gitleaks.toml
ruleset, therefore, I believe it only make sense to use the same configuration for our implementation. Please read more on this topic from the README of our Secret Detection Go POC (internal only).
Decision
The secrets we will initially be detecting can be found at https://gitlab.com/gitlab-org/gitlab/-/blob/9b2563cd093ce04844d4b820b6c2e9c2a4501b96/gems/gitlab-secret_detection/lib/gitleaks.toml. There are 35 rules.