Reduce FP in secret detection by adding word boundaries
Problem to solve
In gitlab-org/security-products/analyzers/secrets!221 (comment 1415815305), @jamesliu-gitlab
mentioned that some of our custom regular expressions defined in gitleaks.toml ensure that the tokens are not matched when they're part of a longer string, but are matched when they are part of an assign statement. For example, for the fake token:
sk-000000000000000000000000000000000000000000000000
should not be matched when it's enclosed in a long string
RANDOMTEXTsk-000000000000000000000000000000000000000000000000RANDOMTEXT
but should match when included in an assignment
password="sk-000000000000000000000000000000000000000000000000"
gitlab-org/security-products/analyzers/secrets!223 (merged) adds unit tests to check that each regular expression adheres to this rule, but most do not, meaning that it's likely that some secret vulnerability findings are likely to be false positives because parts of long strings happen to match a rules regular expression.
Proposal
Remove all of the rule exclusions from TestGitleaksTOMLRegexp
and update the gitleaks.toml so that all pass.
Then run the fp-benchmark-test
on the MR to see if the changes have a positive effect.