Align codebase of all 3 gemnasium analyzers
Summary
There are important discrepancies in the codebase of the 3 Gemnasium analyzer projects. These discrepancies need to be removed so that these 3 projects can be eventually share the same codebase, and be merged into a single one.
Further details
-
gemnasium-mavenandgemnasium-pythondepends on common/command, butgemnasiumdoesn't. -
gemnasiumusesScanner.ScanDirbutgemnasium-pythonusesScanReaderandgemnasium-mavenusesScanFile. - As a consequence,
gemnasiumis the only project that uses scanner/finder package, and other projects rely on custom detection logic. -
gemnasium-mavenandgemnasium-pythonboth run CLI commands to generate file that can be parsed, whereasgemnasiumdirectly parses supported files. -
gemnasium-mavenandgemnasium-pythonlook for supported files in a specific order, whereasgemnasiumdoesn't. In the case ofgemnasium-python, default pip files win over setuptools files, andPIP_REQUIREMENTS_FILEwins over everything else. -
gemnasiumscans all the supported files whereasgemnasium-mavenandgemnasium-pythonstops after the first match.gemnasium-mavencould technically process multiple files but this would be a behavior change.gemnasium-pythoncouldn't because of other limitations - scanned project would leak its dependencies to the next ones.
Links
- https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/v2.21.0/main.go
- https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium-python/-/blob/v2.15.0/analyze.go
- https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium-maven/-/blob/v2.18.4/analyze.go
Proposal
- add structs that represent the package managers as well as the files they handle
- implement a generic detection logic that leverages these structs and return supported projects; it can be configured to maintain the behavior of
gemnasium-mavenandgemnasium-python - introduce "builders" as an abstraction for commands executed to export the dependencies to a file Gemnasium can parse (execution of the Gemnasium Maven plugin, pipenv graph, etc.)
- align all CLIs so that they use the same detection logic, build the projects that need to be built (where applicable), and scan them
Share as much code as possible, and let Each plugin system is implemented in the main Each plugin lives with the project where its used: For instance, The The The walk function walks the given directory, and finds no more than one file per package type per directory. It queries the registered builders and parsers to figure out what files are supported. Parsers are queried first, so that ready-to-parse lock files win over dependency files (which require some kind of build). The walk function is configurable, and it can optionally stop right after the first match. This way The process function replaces Ideally, the See initial proposal
gemnasium-python and gemnasium-maven implement specific plugins:
builder plugins, to build the project and generate files that can be parsedparser plugins, to parse the lock file or dependency graph, and extract a list of packagesvrange plugins, to check if a version is included in a rangegemnasium project, and this is where the plugin registry lives.
builder plugins are introduced in gemnasium-maven and gemnasium-python
vrange plugins are movedscanner/parser plugins are movedvrange/python moves to gemnasium-python, and scanner/parser/mvnplugin moves to gemnasium-maven.gemnasium project still builds the gemnasium Docker image, but it uses the same API as gemnasium-python and gemnasium-maven, for consistency. In the long term, shared code could be extracted into a separate repository.scanner package becomes generic, and only has two exported methods:
gemnasium-maven and gemnasium-python can behave like they currently do.ScanDir, ScanFile, and ScanReader, which are no longer needed.main.go of a Gemnasium project is simple as:
NewApp using the gemnasium/cli package, and running this app
Implementation plan
- introduce "builders" as an abstraction layers for the CLI tools executed to get a parseable dependency list
-
refactor gemnasium-mavenusing builders gitlab-org/security-products/analyzers/gemnasium-maven!78 (merged) -
refactor gemnasium-pythonusing builders gitlab-org/security-products/analyzers/gemnasium-python!69 (merged) -
align gemnasiumwith this, and try it out using abundle installor equivalent
-
- align "finders" so that they all support multiple projects, and so that parseable files win over files that require the execution of a "builder"
-
implement generic project finder in gemnasiumgitlab-org/security-products/analyzers/gemnasium!134 (merged) -
use generic finder in gemnasium-mavengitlab-org/security-products/analyzers/gemnasium-maven!87 (merged) -
use generic finder in gemnasium-pythongitlab-org/security-products/analyzers/gemnasium-python!76 (merged)
-
Improvements
- codebase is ready to be merged
- codebase is more consistent, making contribution easier
Risks
None identified
Involved components
Optional: Intended side effects
- implementing
poetry.locksupport becomes trivial #7006 (closed) - implementing
Pipfile.locksupport becomes trivial #11756 (closed) - it becomes easy to make
yarn.lockwin overpackage-lock.json#198032 (comment 381075492) - multi-repo support for Java can easily be implemented #250650 (closed)
Optional: Missing test coverage
None