Design: Change vulnerability feedback identification

Problem to solve

The vulnerability feedback model currently relies on a fingerprint to match it with vulnerability findings. Though, this fingerprint is actually a SHA1 of the vulnerabilities[].cve property (AKA compare key) of the JSON Report format, and we want to get rid of this legacy property, so we need to find another approach for the identification of the vulnerability feedback.

Intended users

Further details

The vulnerability feedback model relies on the project_fingerprint property to match it with vulnerability findings. This approach allows:

to match feedback with both vulnerability records stored in the database for the default_branch and vulnerability objects reported in JSON artifacts for other branches (and MR).
to have a stable identifier that is similar between different executions of the analyzer, to keep matching the feedback with the vulnerability (e.g. between pipelines)

Though, due to how we generate this fingerprint, #2 is not as stable as expected and is also not consistent between the different analyzers. It also sometimes relies on proprietary properties that aren't part of the common format and thus, not sent to the rails backend.

Proposal

The goal of the feedback (dismissal, issue, MR) is to attach information to a vulnerability finding and make sure this information follows it over time. E.g. when we dismiss a vulnerability, we want it to stay dismissed as long as it is reported by the scans.

This goal is actually very similar to being able to track a vulnerability finding over time, in a stable way. So the proposal is to stick to the exact same identification scheme:

report type
primary identifier fingerprint
location fingerprint

By having the same identification between vulnerability findings and feedback, we ensure that we'll have the same stability, and thus that we'll keep the feedback tied to the findings at least as much as we're capable of tracking the finding. This also means that any improvement we can do on the vulnerability finding tracking can also be applied to the feedback.

Migration of existing data

The feedback feature is still in the alpha stage, so data loss is acceptable but we should try to migrate existing data if possible.

We could loop through existing feedback records that are linked to a vulnerability occurrence record (so feedback created from the default_branch) and from it, extract the report type, primary identifier fingerprint, and location fingerprint to generate the new fingerprint used to do the matching.

We then could loop through the remaining feedback records that are not linked to occurrence record (so feedback created from MR or pipeline view for feature branches), load the corresponding JSON artifact (we know the pipeline id), find the corresponding vulnerability and extract the report type, primary identifier fingerprint, and location fingerprint to generate the new fingerprint used to do the matching. Some artifacts may be expired and there would then be no way to migrate the feedback.

We can also discuss the relevance of trying to implement a backward-compatible finder that would rely on the old fingerprint.

Permissions and Security

Documentation

The feedback is not well documented today but we could add such information to the 3rd party integration doc: https://docs.gitlab.com/ee/development/integrations/secure/

Availability & Testing

The migration should be well tested.

What does success look like, and how can we measure that?

The vulnerability feedback is identified the same way as the vulnerability findings.

What is the type of buyer?

GitLab Ultimate

Links / references

First approach (deprecated)

Add two columns to vulnerability_feedback (primary identifier fingerprint, location fingerprint) (weight: 2)
Expose primary identifier fingerprint, location fingerprint when vulnerabilities displayed (we currently don't expose this in our api end points) (weight: 3)
Make rails populate this two columns whenever new feedback is added (this could involve front endbecause we don't know how to make relationship between vulnerability_feedback and vulnerability_occurrence only way to do that is when we create feedback FE needs to send primary identifier and location fingerprint ) (weight: 3)
Change application logic in MR widget, security dashboards, first class vulnerability and vulnerability api end points, to use new information to match between vulnerability_occurrence and vulnerability_feedback (weight: 5)
Write background migration to populate added columns (weight: 5)
MR widget still hits Projects::VulnerabilityFeedbackController#index to get vulnerability feedback information this needs to be cleaned up since we get this information from backend (frontend work), in addition, we can disable reactive cache whenever we add vulnerability feedback otherwise user won't see change immediately see this for additional part (!18960 (merged)) (for invalidating cache part its weight: 2)
Deprecate project_fingerprint column from vulnerability_feedback and vulnerability_occurrence table (weight: 3)

Second approach (favoured)

Add one column called feedback_fingerprint to vulnerability_feedback and vulnerability_occurrence (weight: 2)
calculate feedback_print (calculation is in below code snipped) (weight: 2)
Make rails populate new feedback_fingerprint column whenever new occurrence is added to db and reported in artifact (weight: 3)
Make rails populate new feedback_fingerprint column whenever new feedback is added. (this could involve front endbecause we don't know how to make relationship between vulnerability_feedback and vulnerability_occurrence only way to do that is when we create feedback FE needs to send primary identifier and location feedback_fingerprint ) (weight: 3)
Write background migration to populate added columns for existing vulnerability_feedback and vulnerability_occurrence (weight: 5)
Change application logic in MR widget, security dashboards, first class vulnerability and vulnerability api end points, to use new information to match between vulnerability_occurrence and vulnerability_feedback (weight: 5)
MR widget still hits Projects::VulnerabilityFeedbackController#index to get vulnerability feedback information this needs to be cleaned up since we get this information from backend (frontend work), in addition, we can disable reactive cache whenever we add vulnerability feedback otherwise user won't see change immediately see this for additional part (!18960 (merged)) (for invalidating cache part its weight: 2)
Deprecate project_fingerprint column from vulnerability_feedback and vulnerability_occurrence table (weight: 3)

⚠ First class vulnerabilities will replace security dashboard, instead of using vulnerability_occurrence we will use vulnerability table ⚠

We could add project_id to feedback_fingerprint calculation for both.

Calculating feedback_fingerprint

         def generate_feedback_fingerprint
            res = [primary_identifier&.fingerprint, location&.fingerprint, report_type.to_s].compact.join("")
            Digest::SHA256.hexdigest(res)
          end

Alternative for calculating feedback_fingerprint

This method raised some concerns for stability of identifiers. Identifiers might change frequently therefore if we consider all of them there is higher chance we might lose feedback.

         def generate_feedback_fingerprint
            res = [joined_identifiers, location&.fingerprint, report_type.to_s].compact.join("")
            Digest::SHA256.hexdigest(res)
          end

          # converts and combines all identifers with xor
          # in that way order won't matter
          def joined_identifiers
            convert_to_binary = lambda { |input| input.unpack1("B*").to_i(2) }

            identifiers.map do |identifier|
              convert_to_binary.call("#{identifier.external_type}:#{identifier.external_id}")
            end.compact.reduce(:^).to_s
          end

Additional information

https://youtu.be/sgrQFF_JuqQ video explains how project_fingerprint is used currently

Current records in database

explain SELECT count(id) FROM vulnerability_feedback

Currently we have 5590 vulnerability_feedback in gitlab.com

explain select distinct vf.id from vulnerability_feedback vf inner join vulnerability_occurrences vo on ENCODE(vo.project_fingerprint, 'HEX') = vf.project_fingerprint and vf.project_id = vo.project_id AND vo.report_type = vf.category

4697 vulnerability_feedback coming from occurrences which are saved in db

explain SELECT DISTINCT a.project_fingerprint FROM vulnerability_feedback a INNER JOIN ci_pipelines b ON a.pipeline_id = b.id AND b.ref <> 'master' AND a.project_id = b.project_id

~1695 vulnerability_feedback coming from occurrences which are coming from db

Payload that is send from front end when we create vulnerability feedback


category: "dependency_scanning"
feedback_type: "dismissal"
pipeline_id: 115877433
project_fingerprint: "90034c121a4fda23e2ab87911b6692ca2a30f413"
vulnerability_data: {id: null, report_type: "dependency_scanning",…}
	id: null
	report_type: "dependency_scanning"
	name: "A prototype pollution vulnerability in handlebars may lead to remote code execution if an attacker can control the template in handlebars"
	severity: "high"
	confidence: "unknown"
	scanner: {external_id: "retire.js", name: "Retire.js"}
	identifiers: [{external_type: "retire.js", external_id: "baf1b2b5f9a7c1dc0fb152365126e6c3",…}]
	project_fingerprint: "90034c121a4fda23e2ab87911b6692ca2a30f413"
	create_vulnerability_feedback_issue_path: "/caneldem/security-reports2/-/vulnerability_feedback"
	 create_vulnerability_feedback_merge_request_path: "/caneldem/security-reports2/-/vulnerability_feedback"
	 create_vulnerability_feedback_dismissal_path: "/caneldem/security-reports2/-/vulnerability_feedback"
	 project: {id: 16583999, name: "Security Reports2", full_path: "/caneldem/security-reports2",…}
	 dismissal_feedback: null
	 issue_feedback: null
	 merge_request_feedback: null
	 description: null
	 links: [{url: "https://snyk.io/vuln/SNYK-JS-HANDLEBARS-174183"},…]
	 location: {file: "package.json", dependency: {package: {name: "handlebars"}, version: "4.0.11"}}
	 remediations: [null]
	 solution: null
	 state: "detected"
	 blob_path: "/caneldem/security-reports2/-/blob/1875d04db72f89e37fa7f6b854b06569265ae5cd/package.json"
	 category: "dependency_scanning"
title: "A prototype pollution vulnerability in handlebars may lead to remote code execution if an attacker can control the template in handlebars"

Edited Aug 16, 2020 by Thiago Figueiró