Use lowercase package names in python-pip test expectation

Proposal

The gemnasium pipelines are currently failing. The master and v5 pipelines are failing due to a mismatch in the expectations of the python-pip-qa job:

json atom at path "vulnerabilities/0/location/dependency/package/name" is not equal to expected value:
  expected: "Django"
       got: "django"

This is due to the gemnasium-python-dependency_scanning job producing lowercase package names. Notice the package name is django:

{
  "version": "15.1.4",
  "vulnerabilities": [
    {
      "snip...",
      "location": {
        "file": "requirements.txt",
        "dependency": {
          "package": {
            "name": "django"
          },
          "version": "1.11.4"
        }

In previous successful pipelines, the gemnasium-python-dependency_scanning job produced uppercase package names. Notice the package name is Django:

{
  "version": "15.1.4",
  "vulnerabilities": [
    {
      "snip...",
      "location": {
        "file": "requirements.txt",
        "dependency": {
          "package": {
            "name": "Django"
          },
          "version": "1.11.4"
        }

The reason for this change in behaviour is due to the way our test job runs. If we look at the python-pip-qa job, we see that it executes a downstream pipeline in tests/python-pip:

python-pip-qa:
  extends: .functional
  <snip>
  trigger:
    project: gitlab-org/security-products/tests/python-pip

This runs the tests/python-pip/.gitlab-ci.yml template, which executes the build job:

build:
  image: python:$DS_PYTHON_VERSION
  stage: build
  script:
    - pip wheel --no-binary=':all:' --wheel-dir ./dist -r requirements.txt
    <snip>

This job builds all dependencies into wheels from scratch and stores them in a ./dist directory. It is within this build job where the issue is occurring.

If we look at the last build job that produced uppercase package names, we see that it was using the following python:3.11 image:

Using docker image sha256:a31a35e4126639ba5bf56342b284003aef1bf0e4d48707126b215f64d1da4a7a for python:3.11 with digest python@sha256:16e9aaa3fdc728525d4bf3ce02fc311a18c87222005facc3e3c2a5795d297fe1 ...

vs the most recent build job that produces lowercase packages names:

Using docker image sha256:cc3a4dfabf81638313e38b7fbce9bd1eb0566b4e8029917ae5a5ab4dae1b2f81 for python:3.11 with digest python@sha256:993edd4388f0b3929b79fd954c8878ca81ee6bc6def9375b78d06d2af1905ee3 ...

If we inspect the version of setuptools used in both of these images, we see it's changed:

  • Uppercase package names uses setuptools 65.5.1:

    $ docker run python@sha256:16e9aaa3fdc728525d4bf3ce02fc311a18c87222005facc3e3c2a5795d297fe1 pip show setuptools
    
    Name: setuptools
    Version: 65.5.1
  • Lowercase package names uses setuptools 79.0.1:

    $ docker run python@sha256:993edd4388f0b3929b79fd954c8878ca81ee6bc6def9375b78d06d2af1905ee3 pip show setuptools
    
    Name: setuptools
    Version: 79.0.1

So something has changed between setuptools 65.5.1 and setuptools 79.0.1 that results in lowercased package names.

After further investigation, it turns out the change occurred in setuptools 75.8.1 as a result of Fix wheel file naming to follow binary distribution specification:

Click to expand
#!/usr/bin/env python3

import re

_UNSAFE_NAME_CHARS = re.compile(r"[^A-Z0-9._-]+", re.I)

def safe_name(component: str) -> str:
    """Escape a component used as a project name according to Core Metadata.
    >>> safe_name("hello world")
    'hello-world'
    >>> safe_name("hello?world")
    'hello-world'
    >>> safe_name("hello_world")
    'hello_world'
    """
    # See pkg_resources.safe_name
    return _UNSAFE_NAME_CHARS.sub("-", component)

def filename_component(value: str) -> str:
    """Normalize each component of a filename (e.g. distribution/version part of wheel)
    Note: ``value`` needs to be already normalized.
    >>> filename_component("my-pkg")
    'my_pkg'
    """
    return value.replace("-", "_").strip("_")

def new_safer_name(value: str) -> str:
    """Like ``safe_name`` but can be used as filename component for wheel"""
    # See bdist_wheel.safer_name
    return (
        # Per https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
        re.sub(r"[-_.]+", "-", safe_name(value))
        .lower()
        # Per https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
        .replace("-", "_")
    )

def old_safer_name(value: str) -> str:
    """Like ``safe_name`` but can be used as filename component for wheel"""
    # See bdist_wheel.safer_name
    return filename_component(safe_name(value))


package_name = "Pillow.test-3.3.1-cp311-cp311-linux_aarch64.whl"
print(f"Original package name: '{package_name}'")
print(f"Using safer name from setuptools 75.8.0: '{old_safer_name(package_name)}'")
print(f"Using safer name from setuptools 75.8.1: '{new_safer_name(package_name)}'")

Output:

Original package name: 'Pillow.test-3.3.1-cp311-cp311-linux_aarch64.whl'
Using safer name from setuptools 75.8.0: 'Pillow.test_3.3.1_cp311_cp311_linux_aarch64.whl'
Using safer name from setuptools 75.8.1: 'pillow_test_3_3_1_cp311_cp311_linux_aarch64_whl'

Notice that setuptools 75.8.1 downcases the package name to pillow.

Proposal

We can fix this in any of the following ways:

  1. Pin the build job to setuptools 75.8.0 that produced uppercase package names:

    Click to expand
    diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
    index 85d4407..211aea6 100644
    --- a/.gitlab-ci.yml
    +++ b/.gitlab-ci.yml
    @@ -19,6 +19,7 @@ build:
       image: python:$DS_PYTHON_VERSION
       stage: build
       script:
    +    - pip install 'setuptools==75.8.0'
         - pip wheel --no-binary=':all:' --wheel-dir ./dist -r requirements.txt
         # need to make ./dist world writable so the non-privileged `gitlab` user can update the contents
         - chmod 777 ./dist      
  2. Downcase the package names in gemnasium/qa/expect/python-pip/default/gl-dependency-scanning-report.json.

  3. Implement 2 above, and also downcase (normalize) the package names in gemnasium, similar to how dependency-scanning does it in PEP503NormalizeName().

I don't recommend option 1, since it just hides the problem, so I think option 2 or 3 is a good choice. Regardless of whether we choose option 2 or 3, we'll need to update the python-pip/default/gl-dependency-scanning-report.json expectation file, so we might as well start with option 2 so that our tests pass once again, and then we can decide if we want to implement normalization in gemnasium.

This has also been discussed here, and it turns out that when parsing the gl-dependency-scanning-report.json report, the Rails backend normalizes the package name before creating the location and thus the fingerprint of the vulnerability:

  1. The cyclonedx parser parses the component json.
  2. This calls the component ci parser which calls the report component parser.
  3. The name value here is a conditional that uses the PURL type or the name in the name field.
  4. We use the PURL as the source of truth for the name for everything except the OS purl types.
  5. The PURL normalizes PyPi names.

Implementation Plan

Downcase the package names in the gemnasium/qa/expect/python-pip/default/gl-dependency-scanning-report.json expectation of the following branches:

/cc @gonzoyumo @hacks4oats @ifrenkel @zmartins

Edited by Adam Cohen