Handle requirements.txt files produced by pip-compile as lock files
<!-- Implementation issues are used break-up a large piece of work into small, discrete tasks that can move independently through the build workflow steps. They're typically used to populate a Feature Epic. Once created, an implementation issue is usually refined in order to populate and review the implementation plan and weight. Example workflow: https://about.gitlab.com/handbook/engineering/development/threat-management/planning/diagram.html#plan --> ## Why are we doing this work <!-- A brief explanation of the why, not the what or how. Assume the reader doesn't know the background and won't have time to dig-up information from comment threads. --> Dependency Scanning assumes that [`requirement.txt` files](https://pip.pypa.io/en/stable/reference/requirements-file-format/) do not include the entire dependency graph, and will always attempt to build a project when one is detected. This decision was consciously made because, while it's possible for a project to export a complete list of dependencies in the file, it was not _guaranteed_ to be the case. Thus, to err on the side of caution, the analyzer was configured so that it would take the safe route, and build the project, i.e. install all its dependencies. Some tradeoffs were made with this decision: - Offline and limited network installations would require either preloading all the dependencies into the package cache or a configured python registry/proxy. - The analyzer would need to build the project which introduces some complexity in terms Python version compatibility. With the introduction of the `pip-tools` suite, it's now possible to easily generate a _complete_ dependency graph export in the form of a `requirements.txt`. As a result, many projects have started to adopt this approach, and solely rely on `pip` to install from this file. Dependency Scanning should take advantage of this movement as well, and aim to provide support for projects that have adopted this workflow. It will not only increase the range of use cases it supports, but it will also reduce the complexity of setup for projects that use this setup in an offline environment, and even decrease the time and network bandwidth used when building the project. ## Relevant links <!-- Information that the developer might need to refer to when implementing the issue. - [Design Issue](https://gitlab.com/gitlab-org/gitlab/-/issues/<id>) - [Design 1](https://gitlab.com/gitlab-org/gitlab/-/issues/<id>/designs/<image>.png) - [Design 2](https://gitlab.com/gitlab-org/gitlab/-/issues/<id>/designs/<image>.png) - [Similar implementation](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/<id>) --> - [Example `pip-compile` workflow](http://archive.today/6Mz2A) - [`pip-tools` documentation](https://pip-tools.readthedocs.io/en/latest/#requirements-from-pyproject-toml) ## Non-functional requirements <!-- Add details for required items and delete others. --> - [x] Documentation: The documentation in [Dependency Scanning](https://docs.gitlab.com/ee/user/application_security/dependency_scanning/) will need to reflect the new strategy for scanning requirements.txt files. - [ ] Feature flag: - [ ] Performance: - [x] Testing: Unit tests and integration tests needed. - [ ] Confirm correctness of parsing a `requirements.txt` file. - [ ] Confirm that the analyzer _will_ build `requirements.txt` files if they're _not_ built by `pip-compile`. - [ ] Confirm that we will still handle custom requirements.txt files. This is configured using the `PIP_REQUIREMENTS_FILE` env variable. ## Implementation plan <!-- Steps and the parts of the code that will need to get updated. The plan can also call-out responsibilities for other team members or teams and can be split into smaller MRs to simplify the code review process. e.g.: - MR 1: Part 1 - [ ] ~frontend Step 1 - [ ] ~frontend Step 2 - MR 2: Part 2 - [ ] ~backend Step 1 - [ ] ~backend Step 2 - MR 3: Part 3 - [ ] ~frontend Step 1 - [ ] ~frontend Step 2 --> - [ ] MR 1: Create a new parser that can parse a `pip-compile` requirements file. - [ ] Create a directory named `pip-compile` in the [`scanner/parser/`](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/tree/master/scanner/parser) directory. The directory structure will look like the figure below. The `expect/` directory holds expectations we compare against in tests, `fixtures/` is source files we use in tests, e.g. ` requirements.txt`, and the Go files hold the code related to the parser. ``` ├──expect ├──fixtures ├──pip_compile.go └──pip_compile_test.go ``` - [ ] Implement a parser that parses the versions of packages used in the requirements file. - [ ] Register the parser so that it scans the `requirements.txt` file. An example of how this is done for the `golang` parser can be found [here](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/f52965010d4d69d7bca39c00f9bc52cd65f7e0e7/scanner/parser/golang/golang.go#L78-84). - [ ] MR 2: Update the [pip builder](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/master/builder/pip/pip.go) so that it returns a non-fatal error if it matches a `pip-compile` requirements.txt. - [ ] You can return a non-fatal error by using `builer.NewNonFatalError` that's defined [here](https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/master/builder/builder.go#L24-27). - [ ] A heuristic will be needed to create a simple but effective solution that detects pip-compile files. One way to do this could be to use a buffered IO reader that scans the file line by line looking for a well known `pip-compile` comment left in the output files. - ```# This file is autogenerated by pip-compile with Python``` - [ ] If the heuristic matches, the builder should then return the non-fatal error with a message like `Python pip project not built. A requirments file built by pip-compile detected.`. This will give customers and team members better insight as to _why_ the build was skipped. - [ ] Add specs that test these scenarios. Our integration tests for this are stored in the `spec/gemnasium-python_integration_spec.rb`. They utilize `rspec` and the [`integration-test`](https://gitlab.com/gitlab-org/security-products/analyzers/integration-test/-/tree/main/) project to test the various scenarios. <!-- Workflow and other relevant labels # ~"group::" ~"Category:" ~"GitLab Ultimate" Other settings you might want to include when creating the issue. # /assign @ # /epic & --> ## Verification steps <!-- Add verification steps to help GitLab team members test the implementation. This is particularly useful during the MR review and the ~"workflow::verification" step. You may not know exactly what the verification steps should be during issue refinement, so you can always come back later to add them. 1. Check-out the corresponding branch 1. ... 1. Profit! --> 1. Create a project with a requirements.txt file that is produced using `pip-compile`. 2. Test that this works when running the tests offline. This is a quick test that confirms the project is not built and as a result no dependencies are fetched from the network. ```shell $ docker build -f build/gemnasium-python/redhat/Dockerfile -t gemnasium-python:latest . $ docker run --rm -it -v "$TEST_PROJECT_SRC>:/app" -w /app -e SECURE_LOG_LEVEL=debug gemnasium-python:latest ``` 3. Verify that the dependency scanning report and SBoM contain expected dependencies with the right attributions.
issue