Dependency Firewall for Package Registry: investigation
🔥 Problem
We would like to start the work/implementation of the Dependency Firewall for the grouppackage registry.
In essence, we want to add additional validations when a package enters the GitLab instance.
We're going to start with the security-related validations and before jumping into the implementation, we need to know: what can be done?
🔍 Investigation
We're going to list a few ideas here and to understand better how they will fit in the package registry flow, we will describe a scenario for each of them.
As a refresher, the package registry flow is essentially made of two steps:
- A package is published.
- That package is pulled.
Those are two distinct/different operations done by package manager clients. We don't need to sync them in the sense that we can process
the package between (1.) and (2.). It is during that package process
that we can hook additional validations.
- Dependency scanning. Basically, this feature but working for a package.
- Scenario: I upload a package that has
JUnit
version1.2.3
as a dependency. Version1.2.3
ofJUnit
has a publicly knownmajor
vulnerability. This validator should catch that.
- Scenario: I upload a package that has
- Assuming that (1.) is implemented, how much work there is to have continuous vulnerability scanning but for a package?
- Scenario: I upload a package that has
JUnit
version1.2.3
as a dependency. One week after this publication, version1.2.3
ofJUnit
gets amajor
vulnerability. This part should catch that.
- Scenario: I upload a package that has
- Assuming that (1.) is implemented, how much work there is to have dependency list but for a package?
- Scenario: I upload a package that has
JUnit
version1.2.3
as a dependency. Once the package is available for pulling, if I browse the package registry and I open the package details for my package, I see a SBOM list andJUnit
version1.2.3
is listed there. - Note: not really a validator but an improvement on the package metadata that we display. Related issue: #448921.
- Scenario: I upload a package that has
- Vulnerability Scanning. Similar to (1.) but it is the package itself that is considered not the dependencies.
- Scenario: I upload the package
JUnit
version1.2.3
. Version1.2.3
ofJUnit
has a publicly knownmajor
vulnerability. This validator should catch that. - We should probably use https://advisories.gitlab.com/ here.
- Scenario: I upload the package
- SAST scanning. This feature but for a package source code.
- Scenario: I upload an NPM package. The source code of this package contains a
Code Injection
issue. This validator should catch that. - Note: This will only work on packages that have the source code. It is usually the case for interpreted languages such as
javascript
,ruby
andpython
. Under some conditions, this is also the case forjava
packages.
- Scenario: I upload an NPM package. The source code of this package contains a
- Other tools. The idea here is to implement other validations that are not yet implemented as features in GitLab. For this, I would take PackageHunter as a candidate/example.
- Scenario: I upload an NPM package. That package as a
pre-install
script that pings a third party server during the package installation. This validation should catch that.
- Scenario: I upload an NPM package. That package as a
🔭 Scope
For each idea, we should answer these questions:
- Have a sense of how the validator work.
- How is the validator executed? Background job? In a specific environment like a CI job?
- Is the architecture used by the validator ready for scale. The package registry receives a pretty large amount of uploads per day, the approach used by the validator should be ready for that.
- How are the results stored or could be stored? Could we link that with a Package?
- Ready for scale? There is not really a limit of how many packages a project can host.
- Is there an UI that could be re-used to display those results?
- Have a sense of how much time it takes for the validator to execute.
Please note that I put out of the scope: what do we do we the results.
That is a different question: how should the package registry react to results? We will probably need an issue for that.
Bonus points
- Suggest an implementation order. Obviously, we're not going to implement all ideas at once. Which one should go first.
- For (4.), it would be very helpful to know how https://advisories.gitlab.com/ is accessible from the rails backend: do we need to do an API call? Is that database synced in the rails database ?