Discovery: direction for allowing new analyzers to take over reporting for existing vulnerabilities

Background

With the previous technical discovery work in the attached epic we have solutioned on how to allow the rails app to reassociate previously reported findings and vulnerabilities to newly reported ones. This mechanism (in theory) works similarly to the introduced vulnerability finding signatures work in &5144 (closed).

There are however a number of outstanding questions on how to best approach this problem and a lot of potential solutions, so before we choose a exact direction, we should explore the focus questions to solve prior to SWAGing a solution

Focus Questions

Is it important to maintain a clear audit trail for findings (i.e. whether "Date Detected" remains accurate)?
If we were to explore reassociating existing findings by alternative identifier, where does this occur?
Must the primary identifier of a finding accurately reflect the reporting analyzer?
Which solution enables takeover of third-party findings?

Potential Solutions

Don't remap findings

1. Do nothing, just remove/disable analyzer - [status: REJECTED]

Results in "M findings resolved" and "N findings introduced".

Pro: minimal work
Con: poor UX, poor audit trail

2. Auto-resolve findings in subsequent scans - [status: REJECTED]

If we were to do this, it would still require customers either opt-in to resolve data issues or enable-by-default.

Pro: Simple design, contributes to multiple group roadmaps, avoids over-fixation on analyzer internals
Con: degrades audit trail (age of finding no longer represents length of time in codebase)

See &5708 (closed)

Remap findings

3A. Change primary identifier of new analyzers to match old within analyzers (A stands for "Analyzer") - [status: REJECTED]

This is likely the lowest effort solution while compromising some data accuracy. Essentially, change semgrep's bandit findings to return the bandit identifier as the primary identifier.

Pro: low effort
Con: hacky

See #331551 (closed)

3B. Change primary identifier of new analyzers to match old within monolith - [status: Attempted]

This is likely the lowest effort solution while compromising some data accuracy. Add a remapping to represent the direction of {eslint: :semgrep}.

Essentially, ingest semgrep findings and reorder them on the fly to move bandit identifiers to primary position above semgrep.

Pro: low effort
Con: hacky

See #331551 (closed)

4. Make secondary identifier match if there is no primary identifier (or uuid) match - [status: Accepted]

This is likely to be the most accurate but introduces some complex N+1 problems with iterating over all identifier results comparatively.

Pro: Enables associations of any report kind (across report types + third party)
Con: Complex

See #331626 (closed)

4. Implementation

(See #353271 (comment 859225232))

Swap ANALYZER_ORDER placing semgrep above corresponding SAST analyzers.

When processing a security_finding, we attempt to find vulnerability_finding by UUID.

If UUID match, update vulnerability_finding

If No UUID match, calculate UUIDs for secondary identifiers (non-generic) attempt vulnerability_finding lookup by UUIDs

If No UUIDs match, insert new vulnerability_finding.

5. Implement finding groups - [status: REJECTED]

By re-associating findings to vulnerabilities as a one-to-many we provide a means of groupings findings with the same vulnerability. This would essentially treat the vulnerability record as stateful and maintain an associated audit trail between previous findings from reports and new ones.

Pro: maintain audit trails (via finding relationships)
Con: high effort, unclear side-effects

Relates to &2652 (closed)

Edited Apr 05, 2022 by Lucas Charles