Run data-analysis comparing old and new versions of scope_offset_compressed

TL;DR Perform an evaluation of the updated tracking algorithm scope_offset_compressed (v2) to avoid a bad user experience and performance issues on the platform due to too many generated vulnerabilities. The goal is to run the tracking algorithm on a real project by replaying its Git history and to record the progression of counted vulnerabilities over time (per algorithm) which is the strategy we applied in the past for evaluating tracking algorihms.

Problem Description

Advanced Vulnerability Tracking supports different types of tracking algorithms. At the moment we have two algorithms: scope_offset and scope_offset_compressed.

When we implemented the first tracking algorithm scope_offset, we performed a data-analysis to measure the impact in terms of deduplication performance. The motivation for this effort was to understand/measure the impact of vulnerability tracking concerning the user experience and the potential performance penalty. This experiment was also an additional safeguard to ensure that we do not produce too many vulnerabilities due to a bug in the implementation which could have a bad effect on the whole platform in the worst case. We used SourceWarp to perform the e2e test under realistic conditions.

scope_offset_compressed is an enhanced version of scope_compressed that ignores non-functional source code lines when computing the offset component. The initial implementation of scope_offset_compressed (v1) included a bug that led to falsely underreported vulnerabilities.

For scope_offset_compressed (v1) we performed a data-analysis similar to the one we performed for scope_offset. However, due to the bug mentioned above, the numbers are probably not accurate so that we have to redo the experiment. Ideally, we would integrate the test right into the CI/CD pipeline of https://gitlab.com/gitlab-org/security-products/post-analyzers/tracking-calculator so that we can get an idea about the impact of algorithmic updates before merging changes.

This scope_offset_compressed (v1) bug that led to underreporting was fixed in https://gitlab.com/gitlab-org/security-products/post-analyzers/tracking-calculator/-/merge_requests/83+s which we call scope_offset_compressed (v2); scope_offset_compressed (v2) will lead to an increased number of vulnerabilities compared to scope_offset_compressed (v1). Hence, we expect scope_offset_compressed (v2) to fall between scope_offset_compressed (v1) as lower bound and scope_offset as upper bound. This discussion states that we have to collect evidence that this is really the case to avoid (1) a bad user experience and (2) a bad performance impact on the platform.

What we aim to deliver

Experimental data that illustrates the impact of scope_offset_compressed (v2) in comparison to scope_offset_compressed (v1) and scope_offset. We highlight the extent to which scope_offset_compressed (v2) produces new vulnerabilities in comparison to scope_offset_compressed (v1). We'll include the execution time, too.
A playbook for re-running the data-analysis for future changes on vulnerability tracking. This is relevant because in https://gitlab.com/gitlab-org/gitlab/-/issues/478500+s we identified some bugs and for each of the fixes we likely have to run the same experiment again.
Integration of an automated tests into the CI pipeline of tracking-calculator.

Implementation steps

Simplify https://gitlab.com/gitlab-org/vulnerability-research/foss/sourcewarp or https://gitlab.com/theoretick/replay to use Docker mode only and to make it work in conjunction with GDK as well as Docker.
Run a one-off data-analysis using tracking-calculator scope_offset and scope_offset_compressed (v1, v2) with the GitLab code base. We could use the same experimental setup in https://gitlab.com/gitlab-org/secure/vulnerability-research/research/experiments/-/issues/1+s. We could either use a GDK gitlab instance and count the vulnerabilities in the vulnerability report as we progress through the timeline or we could directly count the findings from the database. We could also leverage the https://gitlab.com/gitlab-org/secure/static-analysis/vulntracking-simulator that aims to simulate the backend behaviour.
Write a playbook or script that enables us to easily re-run the experiment in the future.
(Optional but very nice to have 😊) Implement an integration-test as CI job in tracking-calculator to compute the progression of vulnerability counts for the various algorithms over time. To keep the integration simple and lightweight, we could use https://gitlab.com/gitlab-org/secure/static-analysis/vulntracking-simulator to simulate the backend behaviour.

Edited Jan 27, 2025 by Julian Thome