Reduce FP in secret detection by adding word boundaries

Problem to solve

In gitlab-org/security-products/analyzers/secrets!221 (comment 1415815305), @jamesliu-gitlab mentioned that some of our custom regular expressions defined in gitleaks.toml ensure that the tokens are not matched when they're part of a longer string, but are matched when they are part of an assign statement. For example, for the fake token:

sk-000000000000000000000000000000000000000000000000

should not be matched when it's enclosed in a long string

RANDOMTEXTsk-000000000000000000000000000000000000000000000000RANDOMTEXT

but should match when included in an assignment

password="sk-000000000000000000000000000000000000000000000000"

gitlab-org/security-products/analyzers/secrets!223 (merged) adds unit tests to check that each regular expression adheres to this rule, but most do not, meaning that it's likely that some secret vulnerability findings are likely to be false positives because parts of long strings happen to match a rules regular expression.

Proposal

Remove all of the rule exclusions from TestGitleaksTOMLRegexp and update the gitleaks.toml so that all pass.

Then run the fp-benchmark-test on the MR to see if the changes have a positive effect.

Edited Sep 08, 2023 by Craig Smith