Dependency Firewall for Package Registry: investigation

🔥 Problem

We would like to start the work/implementation of the Dependency Firewall for the grouppackage registry.

In essence, we want to add additional validations when a package enters the GitLab instance.

We're going to start with the security-related validations and before jumping into the implementation, we need to know: what can be done?

🔍 Investigation

We're going to list a few ideas here and to understand better how they will fit in the package registry flow, we will describe a scenario for each of them.

As a refresher, the package registry flow is essentially made of two steps:

A package is published.
That package is pulled.

Those are two distinct/different operations done by package manager clients. We don't need to sync them in the sense that we can process the package between (1.) and (2.). It is during that package process that we can hook additional validations.

(Focus) Dependency scanning. Basically, this feature but working for a package.
- Scenario: I upload a package that has JUnit version 1.2.3 as a dependency. Version 1.2.3 of JUnit has a publicly known major vulnerability. This validator should catch that.
(Focus) Assuming that (1.) is implemented, how much work there is to have continuous vulnerability scanning but for a package?
- Scenario: I upload a package that has JUnit version 1.2.3 as a dependency. One week after this publication, version 1.2.3 of JUnit gets a major vulnerability. This part should catch that.
(Focus) Assuming that (1.) is implemented, how much work there is to have dependency list but for a package?
- Scenario: I upload a package that has JUnit version 1.2.3 as a dependency. Once the package is available for pulling, if I browse the package registry and I open the package details for my package, I see a SBOM list and JUnit version 1.2.3 is listed there.
- Note: not really a validator but an improvement on the package metadata that we display. Related issue: #448921.
Vulnerability Scanning. Similar to (1.) but it is the package itself that is considered not the dependencies.
- Scenario: I upload the package JUnit version 1.2.3. Version 1.2.3 of JUnit has a publicly known major vulnerability. This validator should catch that.
- We should probably use https://advisories.gitlab.com/ here.
SAST scanning. This feature but for a package source code.
- Scenario: I upload an NPM package. The source code of this package contains a Code Injection issue. This validator should catch that.
- Note: This will only work on packages that have the source code. It is usually the case for interpreted languages such as javascript, ruby and python. Under some conditions, this is also the case for java packages.
Other tools. The idea here is to implement other validations that are not yet implemented as features in GitLab. For this, I would take PackageHunter as a candidate/example.
- Scenario: I upload an NPM package. That package as a pre-install script that pings a third party server during the package installation. This validation should catch that.

🔭 Scope

For each idea, we should answer these questions:

Have a sense of how the validator work.
How is the validator executed? Background job? In a specific environment like a CI job?
- Is the architecture used by the validator ready for scale. The package registry receives a pretty large amount of uploads per day, the approach used by the validator should be ready for that.
How are the results stored or could be stored? Could we link that with a Package?
- Ready for scale? There is not really a limit of how many packages a project can host.
Is there an UI that could be re-used to display those results?
Have a sense of how much time it takes for the validator to execute.

Please note that I put out of the scope: what do we do we the results.

That is a different question: how should the package registry react to results? We will probably need an issue for that.

Bonus points

Suggest an implementation order. Obviously, we're not going to implement all ideas at once. Which one should go first.
For (4.), it would be very helpful to know how https://advisories.gitlab.com/ is accessible from the rails backend: do we need to do an API call? Is that database synced in the rails database ?

Edited May 15, 2024 by David Fernandez