Reuse dependency graph in Gemnasium codebase

Problem to solve

Now that Gemnasium supports Sbt 1.3+ projects by parsing the Graphviz DOT files exported by the sbt-dependency-graph plugin, it ends up building the dependency graph twice:

This redundancy adds to the complexity of the code, and is inefficient.

Proposal

Refactor so that dependency file parsers can directly return dependency graphs, which would then be used to render the Dependency Scanning report and CycloneDX SBOMs.

Parsers could either return:

  • a single graph that captures all dependency categories, like dev and production dependencies
  • a slice of graph where each graph corresponds to a dependency category, like dev dependencies; see #343043 (comment 1122877118)

If we implement the latter, then the caller (currently the scanner) could easily filter out graphs according to DS_INCLUDE_DEV_DEPENDENCIES, or to other CI variables used to select dependency categories. The graphs could be merged into a single graph before being added to the CycloneDX SBOM, or added separately. Related issue: #366168 (closed)

Parsers that can't generate dependency graphs would simply return a list of packages. Discussion: #324617 (comment 537032978)

Implementation plan

TO BE UPDATED! Check new proposal, and the option where parsers return one graph per dependency category/scope/group.

  • change the parser so that parsers can either return a package list or a full dependency graph
    • introduce a PackageList interface that defines a Packages() []Package function
    • remove the Dependency struct type
    • change ParseFunc so that it only returns a PackageList and an error
    • change all parsers so that what they return implements PackageList
  • introduce a shared graph
    • it can be used to build a dependency graph
    • it can be used to query a graph and retrieve nodes corresponding to packages
    • it implements the PackageList interface
  • change convert to use the shared graph structure
    • use type assertion to distinguish a dependency graph from a simple PackageList
    • when a dependency graph is detected, use it to calculate dependency paths
  • change parsers that provide graph information so that they use the shared graph structure; graph exports are used in unit tests

Follow-up: also leverage the dependency graph in remediate

Improvements

The code is simplified. In particular, convert is no longer responsible for building the graph, and parser no longer has to serialize the Dependencies.

Risks

None if unit tests updated the way they should

Involved components

Multiple components of Gemnasium are impacted by this change:

Optional: Intended side effects

Time is saved when processing Sbt projects, because the dependency graph is only built once. This might not be significant though.

Optional: Missing test coverage

None

/cc @ifrenkel @adamcohen @gonzoyumo

Edited by Fabien Catteau