Report exact location of vulnerable npm module installed during dependency scan

Problem to solve

When running Dependency Scanning on an npm or yarn project, and when the git repo doesn't contain a node_modules directory, the report created by the Retire.js analyzer says that all vulnerabilities originate from package.json, even if the affected module is a vendored library detected in node_modules. This is confusing:

There's no distinction between A. affected modules and B. affected vendored libraries found in these modules.
There's no distinction between A. a vendored library a module uses at runtime and B. a vendored library used for testing.
The module where the vendored library has been detected isn't mentioned.

Also, vulnerabilities with the same module name and the same CVE id might collide because of this, even if:

One has been detected in a module listed in the lock file, and the other has been detected in vendored library, in one of those modules.
The vulnerabilities come from vendored libraries detected in different modules.

Examples:

Current behavior has been implemented when fixing #13827 (closed). It was decided to set the vulnerability location to package.json when the vulnerable file does not belong to the repo, to avoid broken links.

Intended users

Further details

When the scanned npm/yarn project has no node_modules directory, the Retire.js analyzer installs the dependencies prior to the scan so that the Retire.js CLI can track vendored library and report vulnerabilities for these. Vendored libraries are detected based on their filename or fingerprint. The Retire.js analyzer sets the location to package.json so that users can jump from the vulnerability modal to the affected dependency file.

Proposal

Add a property to the vulnerability location to indicate that the file (where the vulnerability has been found) isn't in the git repository, and that the UI shouldn't link to it. The analyzer is responsible for detecting files that are not part of the repo, and to report them as such. The Security Report Schema is extended to carry this information between the analyzer producing the report, and the Rails backend consuming it. When serving the frontend, the Rails backend renders a link to the vulnerable only if the Security Report says it's part of the repo.

Please note that the Rails backend already has the information needed to establish whether a file is in the repository. However, the analyzer is responsible for providing this information, for efficiency: we assume that the Rails backend wouldn't be able to serve the frontend quick enough if it had to access the git repo. Ideally the backend would parse the security reports when the pipeline ends, and resolve the vulnerable paths once for all. Unfortunately, right now it parses the reports each time it serves the frontend, so it can't enhance the report with reusable data.

This proposal isn't specific to Retire.js, and it could be leveraged by other analyzers in the future. For instance, Gemnasium could use this when scanning a lock file, and communicate that this file isn't part of the repo; it's been created by some build job, by the job before_script, or by Gemnasium itself.

Figuring out whether a file is part of the git repo could be implemented in the common library, for reusability and separation of concerns. Ideally there's no change in the analyzer out of the upgrade of the common library.

Since all Dependency Scanning ship with git, we could use the git checkout or git ls-files to tell if a file or directory is in the repo being scanned. git checkout HEAD xyz fails if xyz isn't in the repo. git ls-files xyz never fails, but its output only contains repository files.

% git checkout HEAD README.md && echo OK
Updated 0 paths from a59a4ba
OK

% git checkout HEAD MISSING && echo OK  
error: pathspec 'MISSING' did not match any file(s) known to git

% git ls-files MISSING && echo OK
OK

% git ls-files README.md && echo OK
README.md
OK

As a next step, the Secure Report Schema and the Dependency Scanning analyzers could be extended to report both the file where the vulnerability was detected AND the corresponding project file or dependency file. For example, it would report both the exact path of the vendored library, in the node_modules directory, and the path of the corresponding package.json that caused the installation of this vendored library. The backend and frontend would be updated to leverage this information, and they would link to the project/dependency file when the exact vulnerability location doesn't match the content of the git repo.

Old proposals:

Proposal A: Dependency path

A generic solution is to accurately report the dependency path, from the analyzer generating the report to the UI. See #198034 (closed)

Pros:

generic
accurate

Cons:

There's currently no code to make the Retire.js analyzer build the dependency graph, and extract a dependency path. This feature has been implemented in Gemnasium, and there's no plan for porting it to Retire.js.
This doesn't solve the question of deciding whether a link to a lock file should show up in the UI. This is being addressed by Proposal C though.

Proposal B: No link to generated file

Another option is to report the exact location, and communicate that the corresponding file has been generated at run-time and is not clickable. This is Proposal 3 of #13827 (comment 234565550).

Pros: cheap compared to #198034 (closed) (Proposal A)

Proposal C: Report both locations, let the backend decides what to show

Report both the exact location of the vulnerable dependency and the location of the editable dependency file, and make the Rails backend present the more accurate location if there's a match in the git repo, or else the location of the editable dependency file.

The analyzer would be responsible for reporting sets of locations, and the Rails backend would be responsible to figuring out what exists in the repo, and what should be presented to users.

Implementation plan:

extend Security Report syntax
make analyzers report both locations
change the Rails backend

Pros:

generic, accurate, flexible
good separation of concerns
single location for the code responsible for figuring out if a file is in the repo; it's in the Rails backend, and not repeated in the analyzers

Cons:

report format needs to be changed
backend needs to be changed
might result in a significant slowdown when backend serves the frontend

See #263441 (comment 434859721)

Proposal D: Report both locations, mark generated file as such

Another option is to both locations (dependency file and lock file), and communicate whether a particular file is part of the git repo. The analyzer is responsible for marking generated files as such, in the report. The backend and frontend only render links for files that are part of the git repo. New fields must be added to the report format.

Pros: accurate

Cons:

vulnerability feedback is lost
bad separation of concerns because analyzers depend on git, and assume they are scanning a git repo

This is Proposal 3 of #13827 (comment 234565550).

Implementation plan

update the Security Report Schemas, and add a new file to communicate that a vulnerability location doesn't belong to the git repo
update the issue package of the common library to track files that aren't part of the git repo, and report them as such
upgrade to this new version of common, update QA to cover this, and release a new version of retire.js
update the Rails backend so that it doesn't generate link for vulnerable files that aren't part of the git repo

The frontend implementation doesn't change.

Permissions and Security

No change.

Documentation

No change.

Availability & Testing

To be tested using existing test projects:

What does success look like, and how can we measure that?

Users can clearly distinguish multiple occurrences of the same vulnerability (same security advisory and same affected package) affecting a JavaScript library that is vendored in the node_modules directory. They can selectively dismiss some of these occurrences if they consider they are false-positives.

What is the type of buyer?

GitLab Ultimate

Links / references

/cc @dcouture @NicoleSchwartz @gonzoyumo

Edited Jan 18, 2022 by Fabien Catteau