Handle requirements.txt files produced by pip-compile as lock files (#418321) · Issues · GitLab.org / GitLab

Handle requirements.txt files produced by pip-compile as lock files

## Why are we doing this work  Dependency Scanning assumes that [`requirement.txt` files](https://pip.pypa.io/en/stable/reference/requirements-file-format/) do not include the entire dependency graph, and will always attempt to build a project when one is detected. This decision was consciously made because, while it's possible for a project to export a complete list of dependencies in the file, it was not _guaranteed_ to be the case. Thus, to err on the side of caution, the analyzer was configured so that it would take the safe route, and build the project, i.e. install all its dependencies. Some tradeoffs were made with this decision: - Offline and limited network installations would require either preloading all the dependencies into the package cache or a configured python registry/proxy. - The analyzer would need to build the project which introduces some complexity in terms Python version compatibility. With the introduction of the `pip-tools` suite, it's now possible to easily generate a _complete_ dependency graph export in the form of a `requirements.txt`. As a result, many projects have started to adopt this approach, and solely rely on `pip` to install from this file. Dependency Scanning should take advantage of this movement as well, and aim to provide support for projects that have adopted this workflow. It will not only increase the range of use cases it supports, but it will also reduce the complexity of setup for projects that use this setup in an offline environment, and even decrease the time and network bandwidth used when building the project. ## Relevant links  - [Example `pip-compile` workflow](http://archive.today/6Mz2A) - [`pip-tools` documentation](https://pip-tools.readthedocs.io/en/latest/#requirements-from-pyproject-toml) ## Non-functional requirements  - [x] Documentation: The documentation in [Dependency Scanning](https://docs.gitlab.com/ee/user/application_security/dependency_scanning/) will need to reflect the new strategy for scanning requirements.txt files. - [ ] Feature flag: - [ ] Performance: - [x] Testing: Unit tests and integration tests needed. - [ ] Confirm correctness of parsing a `requirements.txt` file. - [ ] Confirm that the analyzer _will_ build `requirements.txt` files if they're _not_ built by `pip-compile`. - [ ] Confirm that we will still handle custom requirements.txt files. This is configured using the `PIP_REQUIREMENTS_FILE` env variable. ## Implementation plan  - [ ] MR 1: Create a new parser that can parse a `pip-compile` requirements file. - [ ] Create a directory named `pip-compile` in the [`scanner/parser/`](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/tree/master/scanner/parser) directory. The directory structure will look like the figure below. The `expect/` directory holds expectations we compare against in tests, `fixtures/` is source files we use in tests, e.g. ` requirements.txt`, and the Go files hold the code related to the parser. ``` ├──expect ├──fixtures ├──pip_compile.go └──pip_compile_test.go ``` - [ ] Implement a parser that parses the versions of packages used in the requirements file. - [ ] Register the parser so that it scans the `requirements.txt` file. An example of how this is done for the `golang` parser can be found [here](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/f52965010d4d69d7bca39c00f9bc52cd65f7e0e7/scanner/parser/golang/golang.go#L78-84). - [ ] MR 2: Update the [pip builder](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/master/builder/pip/pip.go) so that it returns a non-fatal error if it matches a `pip-compile` requirements.txt. - [ ] You can return a non-fatal error by using `builer.NewNonFatalError` that's defined [here](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/master/builder/builder.go#L24-27). - [ ] A heuristic will be needed to create a simple but effective solution that detects pip-compile files. One way to do this could be to use a buffered IO reader that scans the file line by line looking for a well known `pip-compile` comment left in the output files. - ```# This file is autogenerated by pip-compile with Python``` - [ ] If the heuristic matches, the builder should then return the non-fatal error with a message like `Python pip project not built. A requirments file built by pip-compile detected.`. This will give customers and team members better insight as to _why_ the build was skipped. - [ ] Add specs that test these scenarios. Our integration tests for this are stored in the `spec/gemnasium-python_integration_spec.rb`. They utilize `rspec` and the [`integration-test`](https://gitlab.com/gitlab-org/security-products/analyzers/integration-test/-/tree/main/) project to test the various scenarios.  ## Verification steps  1. Create a project with a requirements.txt file that is produced using `pip-compile`. 2. Test that this works when running the tests offline. This is a quick test that confirms the project is not built and as a result no dependencies are fetched from the network. ```shell $ docker build -f build/gemnasium-python/redhat/Dockerfile -t gemnasium-python:latest . $ docker run --rm -it -v "$TEST_PROJECT_SRC>:/app" -w /app -e SECURE_LOG_LEVEL=debug gemnasium-python:latest ``` 3. Verify that the dependency scanning report and SBoM contain expected dependencies with the right attributions.

issue