Normalization of package names in gemnasium-python doesn't adhere to PEP 503
Summary
When searching for advisories matching a project dependency, gemnasium-python normalizes package names. It implements normalization rules documented in PEP 426. However, it doesn't implement the normalization rules documented in PEP 503.
This might result in false negatives in the Dependency Scanning reports generated by gemnasium-python: project dependencies might not get the vulnerabilities they should get.
Further details
Quoting https://peps.python.org/pep-0503/#normalized-names:
This PEP references the concept of a “normalized” project name. As per PEP 426 the only valid characters in a name are the ASCII alphabet, ASCII numbers, ., -, and _. The name should be lowercased with all runs of the characters ., -, or _ replaced with a single - character. This can be implemented in Python with the re module:
import re def normalize(name): return re.sub(r"[-_.]+", "-", name).lower()
This is implemented in https://github.com/pypa/packaging/blob/4d8534061364e3cbfee582192ab81a095ec2db51/src/packaging/utils.py#L43. and should be ported to https://gitlab.com/gitlab-org/security-products/analyzers/gemnasium/-/blob/9d5610d396054fab8e163985a7f40dcdac58bd1e/advisory/repo.go#L217.
See #440196 (comment 1757742165)
Steps to reproduce
Example Project
What is the current bug behavior?
What is the expected correct behavior?
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: \`sudo gitlab-rake gitlab:env:info\`) (For installations from source run and paste the output of: \`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:check SANITIZE=true`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true`) (we will only investigate if the tests are passing)
Possible fixes
-
add a function to implement pep 503 normalization on python package's nameReplace instances of regex[-_.]+
with-
Downcase name
-
update the python parsers that initialize python packages to call the normalization: -
add/update corresponding tests -
Fix expression in pythonGlob so that we follow PEP503 when we compare the scanned dependencies with the advisory paths.