Normalization of package names in SBOM ingestion doesn't adhere to PEP 503
Summary
When ingesting PURLs of SBOM components, the backend normalizes names of Python packages. It implements normalization rules documented in PEP 426. However, it doesn't implement the normalization rules documented in PEP 503.
This might result in incorrect results when scanning project dependencies, such as as:
- unknown licenses (License Scanning)
- no vulnerabilities (Continuous Vulnerability Scanning)
Further details
Quoting https://peps.python.org/pep-0503/#normalized-names:
This PEP references the concept of a “normalized” project name. As per PEP 426 the only valid characters in a name are the ASCII alphabet, ASCII numbers, ., -, and _. The name should be lowercased with all runs of the characters ., -, or _ replaced with a single - character. This can be implemented in Python with the re module:
import re def normalize(name): return re.sub(r"[-_.]+", "-", name).lower()
This is implemented in https://github.com/pypa/packaging/blob/4d8534061364e3cbfee582192ab81a095ec2db51/src/packaging/utils.py#L43. and should be ported to https://gitlab.com/gitlab-org/gitlab/-/blob/25a5c6e885229d6c96f5091dcfe198cb75729f4b/lib/sbom/package_url/normalizer.rb#L43.
See #440196 (comment 1757742165)
Steps to reproduce
Example Project
What is the current bug behavior?
What is the expected correct behavior?
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: \`sudo gitlab-rake gitlab:env:info\`) (For installations from source run and paste the output of: \`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`) (we will only investigate if the tests are passing)
Possible fixes
Update the normalizer implementation.
Implementation plan
Update Sbom::PackageUrl::Normalizer#normalize_pypi to support pep 503:
- Replace instances of regex
[-_.]+
with-
- Keep
downcase
gsub(PYPI_REGEX, '-').downcase