Draft: Improve Vulnerability Tracking: proof of concept (!45339) · Merge requests · GitLab.org / GitLab

James Johnson requested to merge d0c-s4vage-use_ctag_scope_vuln_tracking into master Oct 15, 2020

This is the proof-of-concept MR for the Improve Vulnerability Tracking: MVP: Scope+Offset epic: https://gitlab.com/groups/gitlab-org/-/epics/4690. It is NOT intended to be merged directly into GitLab. This MR will be split into separate MRs and then iterated upon:

Database: !52720 (merged)	Backend I (Store): !54515 (closed)	Backend II: (Use+Feature Flag): !54608 (merged)	EE Documentation: !52942 (closed)

What does this MR do?

The changes in this merge request allow multiple fingerprinting algorithms to be used to compare vulnerabilities in a migration-less manner.

This lets us develop better fingerprinting algorithms while keeping backwards compatibility with existing vulnerability fingerprints.

High-Level Details

This is done by:

Adding a new vulnerability_tracking_fingerprints table that enables one-to-many relationships between findings and fingerprints
Using the tracking field in the security report
Depending on a post-analyzer to calculate and insert the actual fingerprint value in a pipeline (see this epic https://gitlab.com/groups/gitlab-org/-/epics/4690)

This merge request DOES NOT contain the implementation for the scope+offset calculation, it only uses the values inserted into the security report JSON during the pipeline (if they exist).

Technical Details

Security Report JSON Tracking Field (expand me)

The tracking field schema assumed in this MR has the following form:

"tracking": {
  "type": "source",
  "positions": [
    { "file": "path/to/file1.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file1.ext|scope1|scope2:offset" } },
    { "file": "path/to/file2.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file2.ext|scope1|scope2:offset" } },
    { "file": "path/to/file3.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file3.ext|scope1|scope2:offset" } }
  ]
}

The tracking field is originally (ideally) created by an analyzer for each vulnerability. The fingerprints subfield is added by a post-analyzer.

If the tracking field does not exist, the behavior is to fall back to hashing data from the location field in the same manner as before.

See #233167 (closed) and #293706 (closed) for details on the tracking field

Multiple Fingerprints Per Vulnerability (expand me)

A new table, vulnerability_tracking_fingerprints, is added that creates a one-to-many relationship between findings and vulnerability tracking fingerprints.

Each tracking fingerprint record associated with a finding contains:

Field	Notes
`finding_id`	A foreign key to the finding
`sha`	A sha of the tracking fingerprint
`track_type`	The type of tracking (`hash` or `source`) This is the declared tracking type from the analyzer
`track_method`	This is NOT set by the analyzer, but is the specific tracking algorithm used to fulfill `track_type`. `hash` and `scope+offset` are examples of this

Additionally, the code itself has priorities set for each tracking method. This is NOT set by the analyzer and is used to compare tracking fingerprints between vulnerabilities in descending priority. The priority is intended to be a relative accuracy rating. Higher priority tracking fingerprints should be more accurate than lower priority tracking fingerprints.

Comparing Vulnerabilities with Fingerprint Priorities (expand me)

Each tracking fingerprint method has a priority defined in the GitLab backend. Vulnerabilities with matching tracking fingerprints are compared in descending order.

One Matching Lower-Priority Fingerprint

For example, Vuln1's Source:Hash tracking fingerprint below is the only match with any of Vuln2's tracking fingerprints. Since this is the first (and only) match, they are considered the same vulnerability.

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint22[fingerprint2]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  type12 -->|matches| type22
  method12 -->|matches| method22
  value12 -->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f9f,stroke:#333,stroke-width:4px;

  class fingerprint22 green
  class fingerprint12 green

Two Matching Fingerprints

In this next example, Vuln1 and Vuln2 have several matching fingerprints. However, the first and highest priority (most accurate) match is with the Source:Scope+Offset tracking fingerprint:

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      priority11["Priority:2"]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      priority12["Priority:1"]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint21[fingerprint1]
      priority21["Priority:2"]
      type21["Type:Source"]
      method21["Method:Scope+Offset"]
      value21["Hash:AAAA"]
    end
    subgraph fingerprint22[fingerprint2]
      priority22["Priority:1"]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  priority11-->|matches| priority21
  type11 -->|matches| type21
  method11 -->|matches| method21
  value11 -->|matches| value21

  priority12-.->|matches| priority22
  type12 -.->|matches| type22
  method12 -.->|matches| method22
  value12 -.->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f9f,stroke:#333,stroke-width:4px;

  class fingerprint11 green
  class fingerprint21 green

Non-Matching High Priority Fingerprints

Prioritized tracking fingerprints are also used to disprove matches. In the example below, Vuln1 and Vuln2 both have matching low-priority Source:Hash fingerprints, but the high-priority Source:Scope+Offset tracking fingerprints do not match

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      priority11["Priority:2"]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      priority12["Priority:1"]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint21[fingerprint1]
      priority21["Priority:2"]
      type21["Type:Source"]
      method21["Method:Scope+Offset"]
      value21["Hash:BBBB"]
    end
    subgraph fingerprint22[fingerprint2]
      priority22["Priority:1"]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  priority11-->|"✗"| priority21
  type11 -->|"✗"| type21
  method11 -->|"✗"| method21
  value11 -->|"✗"| value21

  priority12-.->|matches| priority22
  type12 -.->|matches| type22
  method12 -.->|matches| method22
  value12 -.->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f99,stroke-width:4px,font-weight:bold;

  class fingerprint11 red
  class fingerprint21 red

Notes

The changes in this MR are intended to be gated by a feature flag to make testing/rolling out this new feature easier.

Additionally, this MR will likely be split into three MRs:

Database changes (no feature flag, need the table to be added)
Documentation changes
Backend changes (with feature-flag)

If you have questions/comments about a specific topic, please use discoto to keep the discussion organized:

Create new distinct topics (threads) if one doesn't already exist
Comment on existing topics if one already exists

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

#### POINT: This is a point

* point: This is a point

+ Point: This is a point

- pOINT: This is a point

point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

# Topic: Inline discussion topic 1

## TOPIC: **{+A Green, bolded topic+}**

### tOpIc: Another topic

Quick Actions

Action Description

/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics

/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Action	Description
`/discuss sub-topic TITLE`	Create an issue for a sub-topic. Does not work in epics
`/discuss link ISSUABLE-LINK`	Link an issuable as a child of this discussion

Last updated by this job

⏫ ROOT Improve Vulnerability Tracking &4612 (closed)
◀ PARENT Improve Vulnerability Tracking: MVP: Scope+Offset https://gitlab.com/groups/gitlab-org/-/epics/4690
TOPIC Simplicity and elegance !45339 (comment 443449912)
TOPIC Portability and/or modularity !45339 (comment 443449955)
TOPIC Supportability !45339 (comment 443450003)
TOPIC Maintainability !45339 (comment 443450055)
TOPIC Scalability !45339 (comment 443450085)
TOPIC Extensibility !45339 (comment 443450108)
TOPIC Reliability !45339 (comment 443450124)
TOPIC Security !45339 (comment 443450151)
TOPIC Cost !45339 (comment 443450264)
- No cost changes !45339 (comment 448382135)
TOPIC Performance !45339 (comment 443450287)
- CompareSecurityReportsService x1000 (None existing): BEFORE: TODO !45339 (comment 448368619)
- CompareSecurityReportsService x1000 (None existing): _AFTER: TODO seconds (+-%) !45339 (comment 448368619)
- CompareSecurityReportsService x1000 (x1000 existing): _AFTER: TODO seconds (+-%) !45339 (comment 448368752)
- CompareSecurityReportsService x1000 (x1000 existing): BEFORE: TODO seconds !45339 (comment 448368752)
- StoreReportService x1000 (None existing): _AFTER: TODO (+-%) !45339 (comment 448368966)
- StoreReportService x1000 (None existing): BEFORE: TODO !45339 (comment 448368966)
- StoreReportService x1000 (x1000 existing): BEFORE: TODO !45339 (comment 448369064)
- StoreReportService x1000 (x1000 existing): _AFTER: TODO (+-%) !45339 (comment 448369064)
TOPIC Consistency !45339 (comment 443450329)
TOPIC Related Resources/Issues !45339 (comment 445286307)

Discoto Settings

---
summary:
  max_items: -1
  sort_by: created
  sort_direction: ascending

See the settings schema for details.

Edited Mar 24, 2021 by 🤖 GitLab Bot 🤖

Draft: Improve Vulnerability Tracking: proof of concept