Deduplicating or Remapping SAST findings with the same fingerprint
Problem to solve
As we more fully explore Semgrep analyzers and rules, we will be creating new analyzers that effectively duplicate what we have today. We should find a way to deduplicate these findings so that we're not creating two findings - and therefore two vulnerabilities - when we have more than one analyzer finding the same vulnerabilities when scanning projects of a given language.
This is a common issue across devopssecure as other teams deal with updating or creating vulnerability_occurrences
that map to the same flaw identified by different scanners:
- SAST: bandit -> semgrep (and beyond)
- DS: bundler-audit -> gemnasium
- DAST: zap -> browserker
- CS: klar -> trivy
Further Details
As part of &5245 we are considering replacing analyzers/bandit
with analyzers/semgrep
with a duplicate ruleset. Following our current processes, this will result in a new scanner that returns the same findings with different identifiers (identifiers[0].type: "semgrep"
, not identifiers[0].type: "bandit"
).
There is concern this will duplicate DB records and break all existing vulnerability -> vulnerability_occurrences
mappings.
The naive solution is to simply duplicate findings and rely on users to handle deduplication, but if possible we should attempt to preserve this relationship between our returned findings and the DB records for auditing and data integrity
- Should we be concerned about duplicating findings? (same location, likely different data/descriptions)
- Are findings with an identical location but a different
scanner
duplicates? - Is there a flexible way we can remap findings?
With the work in https://gitlab.com/groups/gitlab-org/-/epics/4690 we are exploring ways to rely on a new tracking
field to separate file location from what we use to track movement of a finding, but the current scope does not include report types or identifiers. See WIP documentation MR for more explanation on this idea
Proposal
TBD
Architectural Support
- Reminder: 72-hour SLA
- Due Date: 2021-02-05
- DRI: @theoretick
Scope Checklist
-
Does not involve architectural decisions -
Is after-the-fact -
Is not already covered by architecture guidelines/handbook -
Has a broad impact within #secure -
Is a new unit of work -
Is strictly #secure -
Could not come to an agreement (escalation) -
Involves architectural decisions
See the scope scoring table below to interpret the checkboxes above
Scope Scoring Table
Reason | in | opt-in | out |
---|---|---|---|
Does not involve architectural decisions | |||
Is after-the-fact | |||
Is not already covered by architecture guidelines/handbook | |||
Has a broad impact within Secure | |||
Is a new unit of work | |||
Is strictly Secure | |||
Could not come to an agreement (escalation) | ? |
||
Involves architectural decisions |
Reviewed by
🤖
Auto-Summary Discoto Usage
Points
Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive)
point:
. For example, the following are all valid points:
#### POINT: This is a point
* point: This is a point
+ Point: This is a point
- pOINT: This is a point
point: This is a **point**
Note that any markdown used in the point text will also be propagated into the topic summaries.
Outcomes
Outcomes define the decisions or resolutions of a discussion. Once outcomes are defined, sub-topics and points are collapsed underneath the outcomes.
Outcomes are declared in a similar manner as points:
#### OUTCOME: This is an outcome
* outcome: This is an outcome
+ Outcome: This is an outcome
- oUTCOME: This is an outcome
outcome: This is an outcome
Note that multiple outcomes may be declared for each topic.
Topics
Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.
Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive)
topic:
. For example, the following are all valid topics:
# Topic: Inline discussion topic 1
## TOPIC: **{+A Green, bolded topic+}**
### tOpIc: Another topic
Quick Actions
Action Description /discuss sub-topic TITLE
Create an issue for a sub-topic. Does not work in epics /discuss link ISSUABLE-LINK
Link an issuable as a child of this discussion Discussion-Size Indicators
The relative size of the discussion occurring within a topic and its sub-topics is indicated via braille dots.
More dots means that more points or sub-topics exist within a given topic.
Examples:
- TOPIC
⣿⣿⡆
A large discussion occurred here- TOPIC
⣇
A smaller discussion occurred here
Last updated by this job
TOPIC
⣇
TOPIC
⣿
⢀
what can be delivered within %13.9 #299589 (comment 501733971)- IMO within %13.9 we can dedupe but we cannot remap once we drop bandit entirely #299589 (comment 501893053)
TOPIC
⣿⡀
⢠
Standardizing on Identifiers #299589 (comment 501902338)- Inject CWE identifiers (if missing) into findings from officially supported scanners #299589 (comment 501902338)
- CWE is always primary, unless CVE exists. CVE > CWE #299589 (comment 501902338)
- stable identifiers support organizations in SLOs and audit trails #299589 (comment 506245456)
Discoto Settings
---
summary:
max_items: -1
sort_by: created
sort_direction: ascending
See the settings schema for details.