Discovery: direction for allowing new analyzers to take over reporting for existing vulnerabilities
Background
With the previous technical discovery work in the attached epic we have solutioned on how to allow the rails app to reassociate previously reported findings and vulnerabilities to newly reported ones. This mechanism (in theory) works similarly to the introduced vulnerability finding signatures work in &5144.
There are however a number of outstanding questions on how to best approach this problem and a lot of potential solutions, so before we choose a exact direction, we should explore the focus questions to solve prior to SWAGing a solution
Focus Questions
- Is it important to maintain a clear audit trail for findings (i.e. whether "Date Detected" remains accurate)?
- If we were to explore reassociating existing findings by alternative identifier, where does this occur?
- Must the primary identifier of a finding accurately reflect the reporting analyzer?
- Which solution enables takeover of third-party findings?
Potential Solutions
Don't remap findings
1. Do nothing, just remove/disable analyzer - [status: REJECTED]
Results in "M findings resolved" and "N findings introduced".
Pro: minimal work
Con: poor UX, poor audit trail
2. Auto-resolve findings in subsequent scans - [status: REJECTED]
If we were to do this, it would still require customers either opt-in to resolve data issues or enable-by-default.
Pro: Simple design, contributes to multiple group roadmaps, avoids over-fixation on analyzer internals
Con: degrades audit trail (age of finding no longer represents length of time in codebase)
See &5708
Remap findings
3A. Change primary identifier of new analyzers to match old within analyzers (A stands for "Analyzer") - [status: REJECTED]
This is likely the lowest effort solution while compromising some data accuracy. Essentially, change semgrep's bandit findings to return the bandit identifier as the primary identifier.
Pro: low effort
Con: hacky
See #331551 (closed)
Attempted]
3B. Change primary identifier of new analyzers to match old within monolith - [status:This is likely the lowest effort solution while compromising some data accuracy. Add a remapping to represent the direction of {eslint: :semgrep}
.
Essentially, ingest semgrep findings and reorder them on the fly to move bandit identifiers to primary position above semgrep.
Pro: low effort
Con: hacky
See #331551 (closed)
Accepted]
4. Make secondary identifier match if there is no primary identifier (or uuid) match - [status:This is likely to be the most accurate but introduces some complex N+1 problems with iterating over all identifier results comparatively.
Pro: Enables associations of any report kind (across report types + third party)
Con: Complex
See #331626 (closed)
4. Implementation
(See #353271 (comment 859225232))
Swap ANALYZER_ORDER
placing semgrep above corresponding SAST analyzers.
When processing a security_finding
, we attempt to find vulnerability_finding
by UUID.
If UUID match, update vulnerability_finding
If No UUID match,
calculate UUIDs for secondary identifiers (non-generic)
attempt vulnerability_finding
lookup by UUIDs
If No UUIDs match, insert new vulnerability_finding
.
5. Implement finding groups - [status: REJECTED]
By re-associating findings
to vulnerabilities
as a one-to-many we provide a means of groupings findings with the same vulnerability. This would essentially treat the vulnerability
record as stateful and maintain an associated audit trail between previous findings from reports and new ones.
Pro: maintain audit trails (via finding relationships)
Con: high effort, unclear side-effects
Relates to &2652 (closed)