GitLab Semgrep and Spotbugs analyzers are causing duplicates in vulnerability report
Summary
A change was recently made in Sort vulnerability links and identifiers (gitlab-org/security-products/analyzers/report!116 - merged) • Adam Cohen • 18.3 to produce deterministic output by sorting the vulnerabilities[].identifiers[] field in the report produced by GitLab secure analyzers. However, sorting the vulnerabilities[].identifiers[] field has introduced a bug due to the fact that the first element of vulnerabilities[].identifiers[] has special significance, since it's considered the primary identifier and must always remain as the first element.
The primary identifier is used by the rails monolith to determine which vulnerabilities are existing, and which ones are new. If the vulnerabilities[].identifiers[] list is sorted when a new pipeline is executed, and the first element is moved to the end of the list, then the primary identifier is changed, which causes duplicate entries to show up in the Vulnerabilities report.
This bug affects the following analyzer versions:
note: spotbugs and semgrep were the only analyzers impacted by this bug, since all the other analyzers only ever produce a single element in the vulnerabilities[].identifiers[] list, and therefore only have a single primary identifier.
GitLab Advanced SAST (GLAS) was not impacted because it's still using report v6.0.0 and gemnasium is using report v5.13.0 so it's not impacted either.
See also https://gitlab.com/gitlab-com/request-for-help/-/issues/3457#note_2787772223
Steps to reproduce
-
Create a new project rfh-3457-4 and add a
gl-sast-report.jsonwheresemgrep_idis the first element in the list of identifiers: -
View the Vulnerability report:
Vulnerability report shows the following vulnerability severity counts:
4 high10 medium2 low
-
Update status of all vulnerabilities to
Confirmedin vulnerability report -
Update
gl-sast-report.jsonand placesemgrep_idas the last element in the list of identifiers: -
Update
gl-sast-report.jsonand placesemgrep_idas the first element in the list of identifiers: -
View the Vulnerability report:
Notice that the vulnerability counts have doubled
- From
4 high10 medium2 low
- To
8 high20 medium4 low
- From
Example Project
What is the current bug behavior?
Duplicate vulnerabilities show up in the Vulnerability report when switching to semgrep v6.7.0
What is the expected correct behavior?
Duplicate vulnerabilities should not appear in the Vulnerability report
Implementation Plan
-
Update the vulnerabilities[].identifiers[] sorting logic so that it only sorts elements 1..Nof the list. In other words, the element at index0should not be moved. -
Release report v6.2.1with the fix from1above. -
Update the following analyzers to report v6.2.1:


