Handle requirements.txt files produced by pip-compile as lock files

Why are we doing this work

Dependency Scanning assumes that requirement.txt files do not include the entire dependency graph, and will always attempt to build a project when one is detected. This decision was consciously made because, while it's possible for a project to export a complete list of dependencies in the file, it was not guaranteed to be the case. Thus, to err on the side of caution, the analyzer was configured so that it would take the safe route, and build the project, i.e. install all its dependencies. Some tradeoffs were made with this decision:

  • Offline and limited network installations would require either preloading all the dependencies into the package cache or a configured python registry/proxy.
  • The analyzer would need to build the project which introduces some complexity in terms Python version compatibility.

With the introduction of the pip-tools suite, it's now possible to easily generate a complete dependency graph export in the form of a requirements.txt. As a result, many projects have started to adopt this approach, and solely rely on pip to install from this file. Dependency Scanning should take advantage of this movement as well, and aim to provide support for projects that have adopted this workflow. It will not only increase the range of use cases it supports, but it will also reduce the complexity of setup for projects that use this setup in an offline environment, and even decrease the time and network bandwidth used when building the project.

Relevant links

Non-functional requirements

  • Documentation: The documentation in Dependency Scanning will need to reflect the new strategy for scanning requirements.txt files.
  • Feature flag:
  • Performance:
  • Testing: Unit tests and integration tests needed.
    • Confirm correctness of parsing a requirements.txt file.
    • Confirm that the analyzer will build requirements.txt files if they're not built by pip-compile.
    • Confirm that we will still handle custom requirements.txt files. This is configured using the PIP_REQUIREMENTS_FILE env variable.

Implementation plan

  • MR 1: Create a new parser that can parse a pip-compile requirements file.
    • Create a directory named pip-compile in the scanner/parser/ directory. The directory structure will look like the figure below. The expect/ directory holds expectations we compare against in tests, fixtures/ is source files we use in tests, e.g. requirements.txt, and the Go files hold the code related to the parser.

      ├──expect
      ├──fixtures
      ├──pip_compile.go
      └──pip_compile_test.go
    • Implement a parser that parses the versions of packages used in the requirements file.

    • Register the parser so that it scans the requirements.txt file. An example of how this is done for the golang parser can be found here.

  • MR 2: Update the pip builder so that it returns a non-fatal error if it matches a pip-compile requirements.txt.
    • You can return a non-fatal error by using builer.NewNonFatalError that's defined here.
    • A heuristic will be needed to create a simple but effective solution that detects pip-compile files. One way to do this could be to use a buffered IO reader that scans the file line by line looking for a well known pip-compile comment left in the output files.
      • # This file is autogenerated by pip-compile with Python
    • If the heuristic matches, the builder should then return the non-fatal error with a message like Python pip project not built. A requirments file built by pip-compile detected.. This will give customers and team members better insight as to why the build was skipped.
    • Add specs that test these scenarios. Our integration tests for this are stored in the spec/gemnasium-python_integration_spec.rb. They utilize rspec and the integration-test project to test the various scenarios.

Verification steps

  1. Create a project with a requirements.txt file that is produced using pip-compile.
  2. Test that this works when running the tests offline. This is a quick test that confirms the project is not built and as a result no dependencies are fetched from the network.
$ docker build -f build/gemnasium-python/redhat/Dockerfile -t gemnasium-python:latest .
$ docker run --rm -it -v "$TEST_PROJECT_SRC>:/app" -w /app -e SECURE_LOG_LEVEL=debug gemnasium-python:latest
  1. Verify that the dependency scanning report and SBoM contain expected dependencies with the right attributions.