Expand security analyzer common functionality to include fingerprinting of vulnerability source
Problem to solve
Our ~Secure analyzers require the functionality to uniquely fingerprint occurrences of vulnerabilities so we have a stable key to compare against. In most cases this could be done uniquely via the line number and rule identifier, however there are certain problems with this approach:
- fingerprints should remain the same if a line number changes. Since vulnerability feedback is tied to a fingerprint any changes to the file will result in a non-matching fingerprint and the vulnerability feedback will no longer be associated (see related discussion https://gitlab.com/gitlab-org/gitlab-ee/issues/6590).
- If a line has multiple occurrences of the same class of vulnerability each occurrence will result in the same comparison key.
Several of our analyzers are currently susceptible to this:
Target audience
-
Sasha, Software Developer, https://design.gitlab.com/research/personas#persona-sasha
-
Sam, Security Analyst, https://design.gitlab.com/research/personas#persona-sam -->
Proposal
Update common
library with functionality for generating a digest of the source code for a given vulnerability.
At a minimum this will require reading a file by filePath
and Line
however for accurate fingerprinting we would want to read by column as well (see problem #2
above).
What does success look like, and how can we measure that?
Vulnerability occurrence fingerprints are stable even when the line number has changed