Incorrect usage of go.sum in go dependency scanning
Release notes
TBD
Summary
We recently started using the awesome dependency scanning feature, especially related to go projects.
When using the scanner, we often get false-positives, because the scanner does not recognize updated dependencies in the go.mod
file.
When updating a dependency including its transitive dependencies via go get -u
, the go.sum
file contains both the old and new version of a dependency, while in the real world, only the newly specified version is used.
Even go mod tidy
will not clean up the file correctly here, if a transitive dependency specifies a different version in its go.mod
file.
This leads to false-positives during the dependency scan.
In my opinion, parsing the go.sum
for all used package is not fully correct, as multiple package versions will show up.
Steps to reproduce
Create 3 go projects:
-
second-module
(v1.0.0
andv1.1.0
): contains exposed module functionSecond()
-
first-module
(v1.0.0
): usesSecond()
function fromsecond-module
and exposes it asFirst()
, specifies v1.0.0 of second-module in its go.mod -
main-project
: callsFirst()
function offirst-module
Finally, call go get -u first-module
in the main-project
, which will update first-module
and all its dependencies.
Take note of the updated go.sum
file in main-project
. This one will now contain references to both v1.0.0
and v1.1.0
of the second-module
, while only v1.1.0
is actually used.
This will lead to false-positives in the dependency scanning.
Example Project
tbd
What is the current bug behavior?
False positives in the go dependency scanning, as old unused imports are scanned as well.
What is the expected correct behavior?
Only actually used modules are scanned and will show up in the scan results
Proposed fix
golang.org/x/tools/go/packages
Utilize
|
|
---|---|
Outputs the modules required by the main module (the module of the repository scanned). | Requires the Go toolchain to be installed. The packaged version of this is around 100MB . |
Can report or filter out test dependencies. | Requires a goproxy to be setup for offline module downloads, vendored dependencies, or a modcache with all dependencies preloaded. |
Please let me know what you think, also let me know if I misunderstood the behavior here. Thanks a lot for this awesome tool!
Older proposals
Keep the highest version of module
The correct way would be probably parsing the
go.mod
file and all its references recursively while merging versions specified on upper levels to lower levels accordingly. => gitlab-org/security-products/analyzers/gemnasium!218 (closed)
|
|
---|---|
Works in online and offline environments. | Can produce false negatives i.e. missed vulnerabilities. |
Does not require Go toolchain to be installed. |
go list
tool
Utilize Also, there is
go list -json -deps
which looks promising and might be a better fit. => gitlab-org/security-products/analyzers/gemnasium!203 (closed)
|
|
---|---|
Outputs the modules required by the main module (the module of the repository scanned). | Requires the Go toolchain to be installed. The packaged version of this is around 100M . |
Can report or filter out test dependencies. | Requires a goproxy to be setup for offline module downloads, vendored dependencies, or a modcache with all dependencies preloaded. |
MR: List go dependencies using go list (gitlab-org/security-products/analyzers/gemnasium!203 - closed)
Edge Cases
Go versions
The implementation must take into account the module's Go version when selecting dependencies. For example, if the module mentions 1.17
building with 1.18
might result in a different set of dependencies selected. By default, the version inside of the main module's go.mod
file is used. For this iteration, matrix testing should be considered out of scope.
Vendored dependencies
Vendored dependencies can be used to lock in the dependencies that are utilized. Referencing the Go mod go directive
documentation, it states that the vendor/modules.txt
file records what modules are used. The format for this looks like the following example.
│ File: vendor/modules.txt
───────┼────────────────────────────────────────────────────────────────────────
1 │ # github.com/BurntSushi/toml v0.3.1
2 │ github.com/BurntSushi/toml
3 │ # github.com/Microsoft/go-winio v0.5.1
4 │ github.com/Microsoft/go-winio
Every line for a module that it depends on is commented and implementing a parser for this may be possible. However, it should be noted that it is possible for this to produce false negatives. See Build tags, operating systems, and architectures for an example of this.
Offline environments
Offline environments are currently supported by the go.sum
parser since it does not require any network configuration. On the other hand, the go list
command and the golang.org/x/tools/go/packages
module both require the modcache
to contain the required modules or a Go project that has been vendored. To prevent a breaking change, we can run go-modlist
first and fallback to go.sum
parsing if needed. If running the go.sum
parser, the possibility of false positives should be explicitly logged and include a link to this issue. Running go-modlist
first offers compatibility for offline environments that have configured the GOPROXY
variable to point to an internal mirror and repositories that contain vendored dependencies.
Build tags, operating systems and architectures
Testing the gemnasium
project with the go list -deps -json -f
solution and go-modlist
produces a total of 41 modules. Interestingly enough, vendoring then parsing vendor/modules.txt
produces 43! Diffing the module outputs and then running go mod why -m
on the diffed modules shows that they are used by the main module.
Output
diff --git a/modules.txt b/modules_2.txt
index 9170913..69df6ad 100644
--- a/modules.txt
+++ b/modules_2.txt
@@ -1,5 +1,7 @@
github.com/BurntSushi/toml
+github.com/Microsoft/go-winio
github.com/ProtonMail/go-crypto
+github.com/acomagu/bufpipe
github.com/bmatcuk/doublestar
github.com/cpuguy83/go-md2man/v2
github.com/davecgh/go-spew
$ go mod why -m -vendor github.com/acomagu/bufpipe
# github.com/acomagu/bufpipe
gitlab.com/gitlab-org/security-products/analyzers/gemnasium/v2/advisory
gitlab.com/gitlab-org/security-products/analyzers/report/v3
gitlab.com/gitlab-org/security-products/analyzers/ruleset
github.com/go-git/go-git/v5
github.com/go-git/go-git/v5/utils/ioutil
github.com/acomagu/bufpipe
$ go mod why -m -vendor github.com/Microsoft/go-winio
# github.com/Microsoft/go-winio
gitlab.com/gitlab-org/security-products/analyzers/gemnasium/v2/advisory
gitlab.com/gitlab-org/security-products/analyzers/report/v3
gitlab.com/gitlab-org/security-products/analyzers/ruleset
github.com/go-git/go-git/v5
github.com/go-git/go-git/v5/plumbing/transport/client
github.com/go-git/go-git/v5/plumbing/transport/ssh
github.com/xanzy/ssh-agent
github.com/Microsoft/go-winio
This indicates that either go list
and go-modlist
are both not accurate or that the vendor directory contains more packages due to it's compatibility with different environments. Investigating the build tags for the files that contains these imports shows that this is correct! The github.com/go-git/go-git/v5/utils/ioutil/pipe_js.go file will only include github.com/acomagu/bufpipe
if the js
build tag is set. Similarly, the github.com/xanzy/ssh-agent package will only include github.com/Microsoft/go-winio
if the windows
tag is set (building on a windows environment). This may cause false negatives if the build system is Linux based and the production system runs on Windows.
It would beneficial to handle this edge case via matrix testing to cover multiple environments. Generating this information and propagating should be scoped within future iterations.
Implementation plan
-
Investigate edge cases, and document them in this issue -
Document limitations and edges cases in user docs -
Document build tags, operating system and architecture edge case. Link Allow go builder configuration (#371779 - closed) which tracks support for specifying build constraints. -
Document offline environment requirements for running new go module resolution.
-
-
Update gemnasium
: Add golang builder to gemnasium and sbomgen-gol... (gitlab-org/security-products/analyzers/gemnasium!392 - merged)-
Add a builder that utilizes the golang.org/x/tools/go/packages module. -
Update the Go Parse function so that it first attempts to parse a list of modules and if unsuccessful uses the go.sum
parser. -
Add unit tests for the cases where a go-project-modules.json
file is present instead of ago.sum
. -
Update the Dockerfiles for Gemnasium so that they include the Go toolchain. To avoid dramatically increasing the size of the build, the packages should only be downloaded and installed upon launching the analyzer. The following Dockerfiles serve as a basis for enabling this functionality. -
Update integration tests based on the go-modules
test project -
Release new version
-
-
Create release post