Skip to content

Tracking Calculator (Post-analyzer Frontend): tracking-calculator deduplicates fingerprints that originate from different algorithms for the same findings

Summary

When generating reports with tracking-calculator, we deduplicate tracking signatures/fingerprints based on their value.

Steps to reproduce

Add the sample below as a file test.cpp to a GitLab project with SAST enabled.

#include <iostream>
#include <cstring>

int test() {
    char x[20];
    char y[20];
    int i = 10;
    memcpy(x, y, i); 
}

The following entry in the gl-sast-report.json will be created:

"tracking": {
        "type": "source",
        "items": [
          {
            "file": "test.cpp",
            "line_start": 8,
            "line_end": 8,
            "signatures": [
              {
                "algorithm": "scope_offset",
                "value": "test.cpp|test()[0]:4"
              }
            ]
          }
        ]
      }

This entry only includes scope_offset due to the deduplication step that omits the addition of scope_offset_compressed. while scope_offset_compressed are identical in this instance they may no be in the future.

Example Project

https://gitlab.com/julianthome/cpptest3

What is the current bug behavior?

This bug relates to a previous discussion about how we compare fingerprints here.

If we are only comparing fingerprints belonging to the same type, the comparison is bound by the rules of the actual vulnerability tracking algorithm. Every algorithms comes with its own definition of vulnerability identity that is ultimately encoded in the fingerprint. Applying comparisons across algorithms of different types essentially breaks the individual definitions of vulnerability identity for all algorithms; instead we are implicitly re-defining vulnerability identity as the union of the definitions provided by the algorithms.

This approach creates friction with regards to the integration of new algorithms because every developer that adds a new tracking algorithm has to know about all the other algorithm to anticipate how the newly added algorithm may behave.

If we have two findings with two related fingerprints both of which are generated using the same algorithm, they were generated based on the parameters of the same tracking algorithm so that we can compare them, and based on the comparison, make a judgement whether or not they refer to the same vulnerability finding according to used algorithm.

If we have two findings with two related fingerprints, one from scope_offset and another one from scope_offset_compressed and both of them match, we cannot tell whether or not they refer to the same, or a different vulnerability finding; they are essentially incomparable because they cannot be compared in any meaningful way. This point also relates to deduplication because at the moment we do not add scope_offset_compressed if an identical scope_offset fingerprint has already been added.

In addition, if we want to fully benefit from the changes in Rails Backend: Vulnerability fingerprints are c... (#470170 - closed) • Julian Thome • 17.11 • On track it would be beneficial to havell signatures available.

What is the expected correct behavior?

All fingerprints for all algorithms should be added to all findings if they are computable.

Possible fixes

Disable deduplication in tracking-calculator per default.

Edited by Julian Thome