Incorrect usage of go.sum in go dependency scanning
Release notes
TBD
Summary
We recently started using the awesome dependency scanning feature, especially related to go projects.
When using the scanner, we often get false-positives, because the scanner does not recognize updated dependencies in the go.mod file.
When updating a dependency including its transitive dependencies via go get -u, the go.sum file contains both the old and new version of a dependency, while in the real world, only the newly specified version is used.
Even go mod tidy will not clean up the file correctly here, if a transitive dependency specifies a different version in its go.mod file.
This leads to false-positives during the dependency scan.
In my opinion, parsing the go.sum for all used package is not fully correct, as multiple package versions will show up.
Steps to reproduce
Create 3 go projects:
-
second-module(v1.0.0andv1.1.0): contains exposed module functionSecond() -
first-module(v1.0.0): usesSecond()function fromsecond-moduleand exposes it asFirst(), specifies v1.0.0 of second-module in its go.mod -
main-project: callsFirst()function offirst-module
Finally, call go get -u first-module in the main-project, which will update first-module and all its dependencies.
Take note of the updated go.sum file in main-project. This one will now contain references to both v1.0.0 and v1.1.0 of the second-module, while only v1.1.0 is actually used.
This will lead to false-positives in the dependency scanning.
Example Project
tbd
What is the current bug behavior?
False positives in the go dependency scanning, as old unused imports are scanned as well.
What is the expected correct behavior?
Only actually used modules are scanned and will show up in the scan results
Proposed fix
Utilize golang.org/x/tools/go/packages
|
|
|
|---|---|
| Outputs the modules required by the main module (the module of the repository scanned). | Requires the Go toolchain to be installed. The packaged version of this is around 100MB. |
| Can report or filter out test dependencies. | Requires a goproxy to be setup for offline module downloads, vendored dependencies, or a modcache with all dependencies preloaded. |
Please let me know what you think, also let me know if I misunderstood the behavior here. Thanks a lot for this awesome tool!
Older proposals
Keep the highest version of module
The correct way would be probably parsing the
go.modfile and all its references recursively while merging versions specified on upper levels to lower levels accordingly. => gitlab-org/security-products/analyzers/gemnasium!218 (closed)
|
|
|
|---|---|
| Works in online and offline environments. | Can produce false negatives i.e. missed vulnerabilities. |
| Does not require Go toolchain to be installed. |
Utilize go list tool
Also, there is
go list -json -depswhich looks promising and might be a better fit. => gitlab-org/security-products/analyzers/gemnasium!203 (closed)
|
|
|
|---|---|
| Outputs the modules required by the main module (the module of the repository scanned). | Requires the Go toolchain to be installed. The packaged version of this is around 100M. |
| Can report or filter out test dependencies. | Requires a goproxy to be setup for offline module downloads, vendored dependencies, or a modcache with all dependencies preloaded. |
MR: List go dependencies using go list (gitlab-org/security-products/analyzers/gemnasium!203 - closed)
Edge Cases
Go versions
The implementation must take into account the module's Go version when selecting dependencies. For example, if the module mentions 1.17 building with 1.18 might result in a different set of dependencies selected. By default, the version inside of the main module's go.mod file is used. For this iteration, matrix testing should be considered out of scope.
Vendored dependencies
Vendored dependencies can be used to lock in the dependencies that are utilized. Referencing the Go mod go directive documentation, it states that the vendor/modules.txt file records what modules are used. The format for this looks like the following example.
│ File: vendor/modules.txt
───────┼────────────────────────────────────────────────────────────────────────
1 │ # github.com/BurntSushi/toml v0.3.1
2 │ github.com/BurntSushi/toml
3 │ # github.com/Microsoft/go-winio v0.5.1
4 │ github.com/Microsoft/go-winio
Every line for a module that it depends on is commented and implementing a parser for this may be possible. However, it should be noted that it is possible for this to produce false negatives. See Build tags, operating systems, and architectures for an example of this.
Offline environments
Offline environments are currently supported by the go.sum parser since it does not require any network configuration. On the other hand, the go list command and the golang.org/x/tools/go/packages module both require the modcache to contain the required modules or a Go project that has been vendored. To prevent a breaking change, we can run go-modlist first and fallback to go.sum parsing if needed. If running the go.sum parser, the possibility of false positives should be explicitly logged and include a link to this issue. Running go-modlist first offers compatibility for offline environments that have configured the GOPROXY variable to point to an internal mirror and repositories that contain vendored dependencies.
Build tags, operating systems and architectures
Testing the gemnasium project with the go list -deps -json -f solution and go-modlist produces a total of 41 modules. Interestingly enough, vendoring then parsing vendor/modules.txt produces 43! Diffing the module outputs and then running go mod why -m on the diffed modules shows that they are used by the main module.
Output
diff --git a/modules.txt b/modules_2.txt
index 9170913..69df6ad 100644
--- a/modules.txt
+++ b/modules_2.txt
@@ -1,5 +1,7 @@
github.com/BurntSushi/toml
+github.com/Microsoft/go-winio
github.com/ProtonMail/go-crypto
+github.com/acomagu/bufpipe
github.com/bmatcuk/doublestar
github.com/cpuguy83/go-md2man/v2
github.com/davecgh/go-spew
$ go mod why -m -vendor github.com/acomagu/bufpipe
# github.com/acomagu/bufpipe
gitlab.com/gitlab-org/security-products/analyzers/gemnasium/v2/advisory
gitlab.com/gitlab-org/security-products/analyzers/report/v3
gitlab.com/gitlab-org/security-products/analyzers/ruleset
github.com/go-git/go-git/v5
github.com/go-git/go-git/v5/utils/ioutil
github.com/acomagu/bufpipe
$ go mod why -m -vendor github.com/Microsoft/go-winio
# github.com/Microsoft/go-winio
gitlab.com/gitlab-org/security-products/analyzers/gemnasium/v2/advisory
gitlab.com/gitlab-org/security-products/analyzers/report/v3
gitlab.com/gitlab-org/security-products/analyzers/ruleset
github.com/go-git/go-git/v5
github.com/go-git/go-git/v5/plumbing/transport/client
github.com/go-git/go-git/v5/plumbing/transport/ssh
github.com/xanzy/ssh-agent
github.com/Microsoft/go-winio
This indicates that either go list and go-modlist are both not accurate or that the vendor directory contains more packages due to it's compatibility with different environments. Investigating the build tags for the files that contains these imports shows that this is correct! The github.com/go-git/go-git/v5/utils/ioutil/pipe_js.go file will only include github.com/acomagu/bufpipe if the js build tag is set. Similarly, the github.com/xanzy/ssh-agent package will only include github.com/Microsoft/go-winio if the windows tag is set (building on a windows environment). This may cause false negatives if the build system is Linux based and the production system runs on Windows.
It would beneficial to handle this edge case via matrix testing to cover multiple environments. Generating this information and propagating should be scoped within future iterations.
Implementation plan
-
Investigate edge cases, and document them in this issue -
Document limitations and edges cases in user docs -
Document build tags, operating system and architecture edge case. Link Allow go builder configuration (#371779 - closed) which tracks support for specifying build constraints. -
Document offline environment requirements for running new go module resolution.
-
-
Update gemnasium: Add golang builder to gemnasium and sbomgen-gol... (gitlab-org/security-products/analyzers/gemnasium!392 - merged)-
Add a builder that utilizes the golang.org/x/tools/go/packages module. -
Update the Go Parse function so that it first attempts to parse a list of modules and if unsuccessful uses the go.sumparser. -
Add unit tests for the cases where a go-project-modules.jsonfile is present instead of ago.sum. -
Update the Dockerfiles for Gemnasium so that they include the Go toolchain. To avoid dramatically increasing the size of the build, the packages should only be downloaded and installed upon launching the analyzer. The following Dockerfiles serve as a basis for enabling this functionality. -
Update integration tests based on the go-modulestest project -
Release new version
-
-
Create release post