Align codebase of all 3 gemnasium analyzers
Summary
There are important discrepancies in the codebase of the 3 Gemnasium analyzer projects. These discrepancies need to be removed so that these 3 projects can be eventually share the same codebase, and be merged into a single one.
Further details
-
gemnasium-maven
andgemnasium-python
depends on common/command, butgemnasium
doesn't. -
gemnasium
usesScanner.ScanDir
butgemnasium-python
usesScanReader
andgemnasium-maven
usesScanFile
. - As a consequence,
gemnasium
is the only project that uses scanner/finder package, and other projects rely on custom detection logic. -
gemnasium-maven
andgemnasium-python
both run CLI commands to generate file that can be parsed, whereasgemnasium
directly parses supported files. -
gemnasium-maven
andgemnasium-python
look for supported files in a specific order, whereasgemnasium
doesn't. In the case ofgemnasium-python
, default pip files win over setuptools files, andPIP_REQUIREMENTS_FILE
wins over everything else. -
gemnasium
scans all the supported files whereasgemnasium-maven
andgemnasium-python
stops after the first match.gemnasium-maven
could technically process multiple files but this would be a behavior change.gemnasium-python
couldn't because of other limitations - scanned project would leak its dependencies to the next ones.
Links
- https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/v2.21.0/main.go
- https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium-python/-/blob/v2.15.0/analyze.go
- https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium-maven/-/blob/v2.18.4/analyze.go
Proposal
- add structs that represent the package managers as well as the files they handle
- implement a generic detection logic that leverages these structs and return supported projects; it can be configured to maintain the behavior of
gemnasium-maven
andgemnasium-python
- introduce "builders" as an abstraction for commands executed to export the dependencies to a file Gemnasium can parse (execution of the Gemnasium Maven plugin, pipenv graph, etc.)
- align all CLIs so that they use the same detection logic, build the projects that need to be built (where applicable), and scan them
Share as much code as possible, and let Each plugin system is implemented in the main Each plugin lives with the project where its used: For instance, The The The walk function walks the given directory, and finds no more than one file per package type per directory. It queries the registered builders and parsers to figure out what files are supported. Parsers are queried first, so that ready-to-parse lock files win over dependency files (which require some kind of build). The walk function is configurable, and it can optionally stop right after the first match. This way The process function replaces Ideally, the See initial proposal
gemnasium-python
and gemnasium-maven
implement specific plugins:
builder
plugins, to build the project and generate files that can be parsedparser
plugins, to parse the lock file or dependency graph, and extract a list of packagesvrange
plugins, to check if a version is included in a rangegemnasium
project, and this is where the plugin registry lives.
builder
plugins are introduced in gemnasium-maven
and gemnasium-python
vrange
plugins are movedscanner/parser
plugins are movedvrange/python
moves to gemnasium-python
, and scanner/parser/mvnplugin
moves to gemnasium-maven
.gemnasium
project still builds the gemnasium
Docker image, but it uses the same API as gemnasium-python
and gemnasium-maven
, for consistency. In the long term, shared code could be extracted into a separate repository.scanner
package becomes generic, and only has two exported methods:
gemnasium-maven
and gemnasium-python
can behave like they currently do.ScanDir
, ScanFile
, and ScanReader
, which are no longer needed.main.go
of a Gemnasium project is simple as:
NewApp
using the gemnasium/cli
package, and running this app
Implementation plan
- introduce "builders" as an abstraction layers for the CLI tools executed to get a parseable dependency list
-
refactor gemnasium-maven
using builders gitlab-org/security-products/analyzers/gemnasium-maven!78 (merged) -
refactor gemnasium-python
using builders gitlab-org/security-products/analyzers/gemnasium-python!69 (merged) -
align gemnasium
with this, and try it out using abundle install
or equivalent
-
- align "finders" so that they all support multiple projects, and so that parseable files win over files that require the execution of a "builder"
-
implement generic project finder in gemnasium
gitlab-org/security-products/analyzers/gemnasium!134 (merged) -
use generic finder in gemnasium-maven
gitlab-org/security-products/analyzers/gemnasium-maven!87 (merged) -
use generic finder in gemnasium-python
gitlab-org/security-products/analyzers/gemnasium-python!76 (merged)
-
Improvements
- codebase is ready to be merged
- codebase is more consistent, making contribution easier
Risks
None identified
Involved components
Optional: Intended side effects
- implementing
poetry.lock
support becomes trivial #7006 (closed) - implementing
Pipfile.lock
support becomes trivial #11756 (closed) - it becomes easy to make
yarn.lock
win overpackage-lock.json
#198032 (comment 381075492) - multi-repo support for Java can easily be implemented #250650 (closed)
Optional: Missing test coverage
None