Remove SSN rule from secret detection ruleset
Proposal
Remove SSN detection from the default Secret Detection ruleset.
Reasoning
Reasons to remove:
- Secret Detection generally focuses on credentials, while SSNs are more of a data loss prevention (DLP)-style rule.
- Numeric patterns are much less readily identifiable than more structured secret formats, making reliable detection more difficult. This applies even when known-invalid SSNs are excluded. See False positives in gitleaks SSN checking (#328256 - closed) and Secret Detection: Tighten regex for Social Secu... (#353038 - closed).
Reasons to keep:
- See original reasoning in Add social security number detection to secret ... (#242119 - closed).
- There have been some recent improvements to this rule's allowlist, such as gitlab-org/security-products/analyzers/secrets!161 (merged)
Overall the balance of factors seems to favor removing the rule, since false positives at the Critical severity are highly disruptive. At a minimum our users have had to resolve this finding type tens of thousands of times.
I (@connorgilbert) would be open to other opinions, but my feeling is that SSNs are categorically different from everything else we try to detect in Secret Detection (more DLP than Secret Detection), and allowing known-invalid SSNs still leaves us open to many digit patterns that aren’t in fact SSNs. There have been a couple of issues in the past that relate to excluding reserved/invalid SSNs from the Social Security Administration but my personal read is that the better move is to stop trying to find SSNs entirely.
Links/references
Internal links (team members only):