Skip to content

Search pypi packages by normalized name

Steve Abrams requested to merge 241678-python-name-normalization into master

What does this MR do?

PyPI packages can be named using characters like . and _, however, when the pip tool is used to install packages, it normalizes the package name, replacing those characters with hyphens - in accordance with the python normalized names standard.

This means if a user uploads a package my.package, then when pip tries to install it, it will request my-package. GitLab will try to find a package named my-package belonging to the requested project and fail.

This MR updates the finding of matching pypi packages to look at the normalized version of the name saved in GitLab's database when comparing to the requested name.

Database

These are the queries and plans for the section of code that uses the new scope .with_normalized_pypi_name. Note that only the last conditional is being changed as the rest of the query is generated from other scopes.

Old query
SELECT "packages_packages".*
FROM "packages_packages"
WHERE "packages_packages"."project_id" = 20760965
  AND "packages_packages"."package_type" = 5
  AND "packages_packages"."version" IS NOT NULL
  AND ("packages_packages"."package_type" != 4
       OR "packages_packages"."name" != 'NuGet.Temporary.Package')
  AND "packages_packages"."name" = 'mypypi.package';
--------------------------------------------------
Query Plan
--------------------------------------------------
Index Scan using index_packages_packages_on_project_id_and_package_type on public.packages_packages  (cost=0.42..3.45 rows=1 width=84) (actual time=0.025..0.026 rows=1 loops=1)
   Index Cond: ((packages_packages.project_id = 20760965) AND (packages_packages.package_type = 5))
   Filter: ((packages_packages.version IS NOT NULL) AND ((packages_packages.name)::text = 'mypypi.package'::text))
   Rows Removed by Filter: 0
   Buffers: shared hit=4
Time: 0.713 ms
  planning: 0.651 ms
  execution: 0.062 ms
      I/O read: 0.000 ms
      I/O write: 0.000 ms
Shared buffers:
  hits: 4 (~32.00 KiB) from the buffer pool
  reads: 0 from the OS file cache, including disk I/O
  dirtied: 0
  writes: 0
https://explain.depesz.com/s/1RZW
New query
SELECT "packages_packages".*
FROM "packages_packages"
WHERE "packages_packages"."project_id" = 20760965
  AND "packages_packages"."package_type" = 5
  AND "packages_packages"."version" IS NOT NULL
  AND ("packages_packages"."package_type" != 4
       OR "packages_packages"."name" != 'NuGet.Temporary.Package')
  AND (LOWER(regexp_replace(name, '[-_.]+', '-', 'g')) = 'mypypi.package');
--------------------------------------------------
Query Plan
--------------------------------------------------
Index Scan using index_packages_packages_on_project_id_and_package_type on public.packages_packages  (cost=0.42..3.46 rows=1 width=84) (actual time=9.021..9.021 rows=0 loops=1)
   Index Cond: ((packages_packages.project_id = 20760965) AND (packages_packages.package_type = 5))
   Filter: ((packages_packages.version IS NOT NULL) AND (lower(regexp_replace((packages_packages.name)::text, '[-_.]+'::text, '-'::text, 'g'::text)) = 'mypypi.package'::text))
   Rows Removed by Filter: 1
   Buffers: shared hit=3 read=4
   I/O Timings: read=8.513
Time: 0.505 ms
  planning: 0.456 ms
  execution: 0.049 ms
    I/O read: 0.000 ms
    I/O write: 0.000 ms
Shared buffers:
  hits: 4 (~32.00 KiB) from the buffer pool
  reads: 0 from the OS file cache, including disk I/O
  dirtied: 0
  writes: 0
https://explain.depesz.com/s/XTuj

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Related to #241678 (closed)

Edited by Steve Abrams

Merge request reports