Skip to content

Draft: Improve Vulnerability Tracking: proof of concept

James Johnson requested to merge d0c-s4vage-use_ctag_scope_vuln_tracking into master

This is the proof-of-concept MR for the Improve Vulnerability Tracking: MVP: Scope+Offset epic: https://gitlab.com/groups/gitlab-org/-/epics/4690. It is NOT intended to be merged directly into GitLab. This MR will be split into separate MRs and then iterated upon:

Database: !52720 (merged) Backend I (Store): !54515 (closed) Backend II: (Use+Feature Flag): !54608 (merged) EE Documentation: !52942 (closed)

What does this MR do?

The changes in this merge request allow multiple fingerprinting algorithms to be used to compare vulnerabilities in a migration-less manner.

This lets us develop better fingerprinting algorithms while keeping backwards compatibility with existing vulnerability fingerprints.

High-Level Details

This is done by:

This merge request DOES NOT contain the implementation for the scope+offset calculation, it only uses the values inserted into the security report JSON during the pipeline (if they exist).

Technical Details

Security Report JSON Tracking Field (expand me)

The tracking field schema assumed in this MR has the following form:

"tracking": {
  "type": "source",
  "positions": [
    { "file": "path/to/file1.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file1.ext|scope1|scope2:offset" } },
    { "file": "path/to/file2.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file2.ext|scope1|scope2:offset" } },
    { "file": "path/to/file3.ext", "line_start": 10, "line_end": 20, "fingerprints": { "scope+offset": "path/to/file3.ext|scope1|scope2:offset" } }
  ]
}

The tracking field is originally (ideally) created by an analyzer for each vulnerability. The fingerprints subfield is added by a post-analyzer.

If the tracking field does not exist, the behavior is to fall back to hashing data from the location field in the same manner as before.

See #233167 (closed) and #293706 (closed) for details on the tracking field

Multiple Fingerprints Per Vulnerability (expand me)

A new table, vulnerability_tracking_fingerprints, is added that creates a one-to-many relationship between findings and vulnerability tracking fingerprints.

Each tracking fingerprint record associated with a finding contains:

Field Notes
finding_id
  • A foreign key to the finding
sha
  • A sha of the tracking fingerprint
track_type
track_method
  • This is NOT set by the analyzer, but is the specific tracking algorithm used to fulfill track_type.
  • hash and scope+offset are examples of this

Additionally, the code itself has priorities set for each tracking method. This is NOT set by the analyzer and is used to compare tracking fingerprints between vulnerabilities in descending priority. The priority is intended to be a relative accuracy rating. Higher priority tracking fingerprints should be more accurate than lower priority tracking fingerprints.

Comparing Vulnerabilities with Fingerprint Priorities (expand me)

Each tracking fingerprint method has a priority defined in the GitLab backend. Vulnerabilities with matching tracking fingerprints are compared in descending order.

One Matching Lower-Priority Fingerprint

For example, Vuln1's Source:Hash tracking fingerprint below is the only match with any of Vuln2's tracking fingerprints. Since this is the first (and only) match, they are considered the same vulnerability.

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint22[fingerprint2]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  type12 -->|matches| type22
  method12 -->|matches| method22
  value12 -->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f9f,stroke:#333,stroke-width:4px;

  class fingerprint22 green
  class fingerprint12 green

Two Matching Fingerprints

In this next example, Vuln1 and Vuln2 have several matching fingerprints. However, the first and highest priority (most accurate) match is with the Source:Scope+Offset tracking fingerprint:

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      priority11["Priority:2"]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      priority12["Priority:1"]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint21[fingerprint1]
      priority21["Priority:2"]
      type21["Type:Source"]
      method21["Method:Scope+Offset"]
      value21["Hash:AAAA"]
    end
    subgraph fingerprint22[fingerprint2]
      priority22["Priority:1"]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  priority11-->|matches| priority21
  type11 -->|matches| type21
  method11 -->|matches| method21
  value11 -->|matches| value21

  priority12-.->|matches| priority22
  type12 -.->|matches| type22
  method12 -.->|matches| method22
  value12 -.->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f9f,stroke:#333,stroke-width:4px;

  class fingerprint11 green
  class fingerprint21 green

Non-Matching High Priority Fingerprints

Prioritized tracking fingerprints are also used to disprove matches. In the example below, Vuln1 and Vuln2 both have matching low-priority Source:Hash fingerprints, but the high-priority Source:Scope+Offset tracking fingerprints do not match

graph LR
  subgraph Vuln1
    subgraph fingerprint11[fingerprint1]
      priority11["Priority:2"]
      type11["Type:Source"]
      method11["Method:Scope+Offset"]
      value11["Hash:AAAA"]
    end
    subgraph fingerprint12[fingerprint2]
      priority12["Priority:1"]
      type12["Type:Source"]
      method12["Method:Hash"]
      value12["Hash:XXXX"]
    end
  end

  subgraph Vuln2
    subgraph fingerprint21[fingerprint1]
      priority21["Priority:2"]
      type21["Type:Source"]
      method21["Method:Scope+Offset"]
      value21["Hash:BBBB"]
    end
    subgraph fingerprint22[fingerprint2]
      priority22["Priority:1"]
      type22["Type:Source"]
      method22["Method:Hash"]
      value22["Hash:XXXX"]
    end
  end

  priority11-->|"✗"| priority21
  type11 -->|"✗"| type21
  method11 -->|"✗"| method21
  value11 -->|"✗"| value21

  priority12-.->|matches| priority22
  type12 -.->|matches| type22
  method12 -.->|matches| method22
  value12 -.->|matches| value22

  classDef green fill:#9f9,stroke-width:4px,font-weight:bold;
  classDef red fill:#f99,stroke-width:4px,font-weight:bold;

  class fingerprint11 red
  class fingerprint21 red

Notes

The changes in this MR are intended to be gated by a feature flag to make testing/rolling out this new feature easier.

Additionally, this MR will likely be split into three MRs:

  • Database changes (no feature flag, need the table to be added)
  • Documentation changes
  • Backend changes (with feature-flag)

If you have questions/comments about a specific topic, please use discoto to keep the discussion organized:

  • Create new distinct topics (threads) if one doesn't already exist
  • Comment on existing topics if one already exists

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Auto-Summary 🤖

Discoto Usage

Points

Discussion points are declared by headings, list items, and single lines that start with the text (case-insensitive) point:. For example, the following are all valid points:

  • #### POINT: This is a point
  • * point: This is a point
  • + Point: This is a point
  • - pOINT: This is a point
  • point: This is a **point**

Note that any markdown used in the point text will also be propagated into the topic summaries.

Topics

Topics can be stand-alone and contained within an issuable (epic, issue, MR), or can be inline.

Inline topics are defined by creating a new thread (discussion) where the first line of the first comment is a heading that starts with (case-insensitive) topic:. For example, the following are all valid topics:

  • # Topic: Inline discussion topic 1
  • ## TOPIC: **{+A Green, bolded topic+}**
  • ### tOpIc: Another topic

Quick Actions

Action Description
/discuss sub-topic TITLE Create an issue for a sub-topic. Does not work in epics
/discuss link ISSUABLE-LINK Link an issuable as a child of this discussion

Last updated by this job

Discoto Settings
---
summary:
  max_items: -1
  sort_by: created
  sort_direction: ascending

See the settings schema for details.

Edited by 🤖 GitLab Bot 🤖

Merge request reports