Document Dependency Scanning paradigm: multiple lock files, or one requirements/parent build file per job

Problem to solve

It's not clear whether a Dependency Scanning analyzer scans multiple dependency files, or a single file.

We need documentation of how it works today. And we need to solicit feedback from users who experience pain with the current way things work so that we can decide on a single rule for both users running the scanners and developers implementing them in the future. This rule needs to be documented, so that users can predict how scanners will behave.

Intended users

Further details

Currently there seems to be a discrepancy between the Gemnasium based analyzers:

This should be codified as two existing rules that function today.

Proposal

Make a new issue asking users if they experienced an issue as a result of the current code and which of the following might have solved their issue:

When executing in a CI job, a Dependency Scanning analyzer would either process:

  • multiple lock files (default)
  • one requirements file (fallback)

When processing a requirements file, the analyzer installs the project dependencies using the package manager, so this is expensive (time and bandwidth). This is why analyzers should NOT process multiple requirements by default. Also, it makes sense to run multiple dependency scanning jobs to process multiple requirements files, to reduce the overall execution time of the pipeline.

Analyzers should first attempt to parse and process lock files because this is both more accurate (it reflects the exact versions used in production) and way cheaper (no need to install the dependencies). They should process a single requirements file as a fallback, or when explicitly requested to do so (variables to be later defined)

This already reflects the way gemnasium, gemnasium-python, and gemnasium-maven currently behave.

Implementation plan

Use this commit as a starting point

  1. Add new column Processes multiple files? to the Supported languages and package managers section. This column should link to a new section in the docs, possibly named How multiple files are processed

  2. In this new section, add the following sub-sections:

    • Ruby
    • Python
    • Java
    • PHP, NuGet, Go, <everything else>

    Provide detailed information in each of the above sub-sections, explaining how files are processed. See this commit for a starting point.

Documentation

To be documented in https://docs.gitlab.com/ee/user/application_security/dependency_scanning/index.html

Testing

none

What does success look like, and how can we measure that?

Users can easily predict how dependency files are scanned. More specifically, they're able to know if a dependency scanning job scans one or multiple files, and how to configure their CI pipeline to scans multiple requirements files.

What is the type of buyer?

TODO

Links / references

/cc @NicoleSchwartz @gonzoyumo @ifrenkel @plafoucriere

Edited by Adam Cohen