Reuse dependency graph in Gemnasium codebase
Problem to solve
Now that Gemnasium supports Sbt 1.3+ projects by parsing the Graphviz DOT files exported by the sbt-dependency-graph plugin, it ends up building the dependency graph twice:
- when parsing DOT files; see scanner/parser/sbt/dotgraph.go
- when generating the Dependency Scanning report, in order to render the dependency paths; see convert/file_converter.go
This redundancy adds to the complexity of the code, and is inefficient.
Proposal
Refactor so that dependency file parsers
can directly return dependency graphs, which would then be used to render the Dependency Scanning report and CycloneDX SBOMs.
Parsers could either return:
- a single graph that captures all dependency categories, like dev and production dependencies
- a slice of graph where each graph corresponds to a dependency category, like dev dependencies; see #343043 (comment 1122877118)
If we implement the latter, then the caller (currently the scanner
) could easily filter out graphs according to DS_INCLUDE_DEV_DEPENDENCIES
, or to other CI variables used to select dependency categories. The graphs could be merged into a single graph before being added to the CycloneDX SBOM, or added separately. Related issue: #366168 (closed)
Parsers that can't generate dependency graphs would simply return a list of packages. Discussion: #324617 (comment 537032978)
Implementation plan
TO BE UPDATED! Check new proposal, and the option where parsers return one graph per dependency category/scope/group.
- change the parser so that parsers can either return a package list or a full dependency graph
- introduce a
PackageList
interface that defines aPackages() []Package
function - remove the
Dependency
struct type - change
ParseFunc
so that it only returns aPackageList
and an error - change all parsers so that what they return implements
PackageList
- introduce a
- introduce a shared
graph
- it can be used to build a dependency graph
- it can be used to query a graph and retrieve nodes corresponding to packages
- it implements the
PackageList
interface
- change convert to use the shared graph structure
- use type assertion to distinguish a dependency graph from a simple
PackageList
- when a dependency graph is detected, use it to calculate dependency paths
- use type assertion to distinguish a dependency graph from a simple
- change parsers that provide graph information so that they use the shared graph structure; graph exports are used in unit tests
Follow-up: also leverage the dependency graph in remediate
Improvements
The code is simplified. In particular, convert is no longer responsible for building the graph, and parser no longer has to serialize the Dependencies.
Risks
None if unit tests updated the way they should
Involved components
Multiple components of Gemnasium are impacted by this change:
- convert
- remediate
- scanner/parser/sbt
- and all parsers that provide graph information, even if they don't parse DOT files
Optional: Intended side effects
Time is saved when processing Sbt projects, because the dependency graph is only built once. This might not be significant though.
Optional: Missing test coverage
None