Skip to content

PyPi group-level package API

Steve Abrams requested to merge 225545-pypi-group-api into master

🏛 Context

Users can publish Python PyPI packages to their GitLab projects.

Currently, when users install packages from their registry, they need to specify the specific project in which the package resides. This means, if users have packages published through a collection of different projects, they need to provide remotes for each project that contains packages: 😞

pip install \
    --extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/projects/1/packages/pypi/simple \
    --extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/projects/2/packages/pypi/simple \
    --extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/projects/3/packages/pypi/simple \
    my-pkg1 my-pkg2 other-pkg and-another-pkg

Being able to use a single group remote would be much more user friendly:

pip install \
    --extra-index-url https://$USERNAME:$PASSWORD@gitlab.com/api/v4/groups/1/-/packages/pypi/simple \
    my-pkg1 my-pkg2 other-pkg and-another-pkg

And that is exactly what this MR does! 🙌

🔎 What does this MR do?

  • Adds two new API endpoints to support installing PyPI packages using a group-level remote
  • Updates the related documentation.

🐘 Database

The Packages::Pypi::PackageFinder processes two types of queries:

  1. Project level searches - this query does not change here

  2. Group level searches - The Finder was recently set up to accept group-level queries in preparation for this MR, but it has never been used for group-level queries yet, so this is a new query.

Visual Explain Plan: https://explain.depesz.com/s/ySoR

SQL Query
SELECT "packages_packages".*
FROM "packages_packages"
INNER JOIN "packages_package_files" ON "packages_package_files"."package_id" = "packages_packages"."id"
WHERE "packages_packages"."project_id" IN (
  SELECT "projects"."id"
  FROM "projects"
  WHERE "projects"."namespace_id" IN (
    WITH RECURSIVE "base_and_descendants" AS (
      (
        SELECT "namespaces".*
        FROM "namespaces"
        WHERE "namespaces"."type" = 'Group'
        AND "namespaces"."id" = 785414
      ) UNION (
        SELECT "namespaces".*
        FROM "namespaces", "base_and_descendants"
        WHERE "namespaces"."type" = 'Group'
        AND "namespaces"."parent_id" = "base_and_descendants"."id"
      )
    )
    SELECT id FROM "base_and_descendants" AS "namespaces"
  )
)
AND "packages_packages"."status" = 0
AND "packages_packages"."package_type" = 5
AND "packages_packages"."version" IS NOT NULL
AND "packages_package_files"."file_name" = 'mypkg-0.1.tar.gz' 
AND "packages_package_files"."file_sha256" = '\x66633964663031326136386538663436323834333631633962623137376331613561363336333134616663313532363033663732383665316666343533653033';
Explain plan (cold cache on production replica)
                                                                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=3015.05..5049.26 rows=1 width=83) (actual time=45.048..60.175 rows=2 loops=1)
   Buffers: shared hit=7017 read=159
   I/O Timings: read=51.173
   ->  Nested Loop  (cost=3014.49..5008.48 rows=13 width=83) (actual time=30.825..59.150 rows=2 loops=1)
         Buffers: shared hit=7012 read=154
         I/O Timings: read=50.207
         ->  HashAggregate  (cost=3014.06..3051.20 rows=3714 width=4) (actual time=2.685..3.255 rows=1661 loops=1)
               Group Key: projects.id
               Buffers: shared hit=2178
               ->  Nested Loop  (cost=1590.73..3004.77 rows=3714 width=4) (actual time=0.669..2.223 rows=1661 loops=1)
                     Buffers: shared hit=2178
                     ->  HashAggregate  (cost=1590.29..1592.20 rows=191 width=4) (actual time=0.653..0.669 rows=60 loops=1)
                           Group Key: namespaces.id
                           Buffers: shared hit=305
                           ->  CTE Scan on base_and_descendants namespaces  (cost=1584.08..1587.90 rows=191 width=4) (actual time=0.059..0.632 rows=60 loops=1)
                                 Buffers: shared hit=305
                                 CTE base_and_descendants
                                   ->  Recursive Union  (cost=0.43..1584.08 rows=191 width=348) (actual time=0.056..0.557 rows=60 loops=1)
                                         Buffers: shared hit=305
                                         ->  Index Scan using index_namespaces_on_type_and_id_partial on namespaces namespaces_1  (cost=0.43..3.45 rows=1 width=348) (actual time=0.029..0.030 rows=1 loops=1)
                                               Index Cond: (((type)::text = 'Group'::text) AND (id = 785414))
                                               Buffers: shared hit=4
                                         ->  Nested Loop  (cost=0.56..157.68 rows=19 width=348) (actual time=0.016..0.098 rows=15 loops=4)
                                               Buffers: shared hit=301
                                               ->  WorkTable Scan on base_and_descendants  (cost=0.00..0.20 rows=10 width=4) (actual time=0.000..0.002 rows=15 loops=4)
                                               ->  Index Scan using index_namespaces_on_parent_id_and_id on namespaces namespaces_2  (cost=0.56..15.73 rows=2 width=348) (actual time=0.004..0.006 rows=1 loops=60)
                                                     Index Cond: (parent_id = base_and_descendants.id)
                                                     Filter: ((type)::text = 'Group'::text)
                                                     Buffers: shared hit=301
                     ->  Index Only Scan using index_projects_on_namespace_id_and_id on projects  (cost=0.44..7.21 rows=19 width=8) (actual time=0.005..0.022 rows=28 loops=60)
                           Index Cond: (namespace_id = namespaces.id)
                           Heap Fetches: 238
                           Buffers: shared hit=1873
         ->  Index Scan using index_packages_packages_on_project_id_and_package_type on packages_packages  (cost=0.43..0.50 rows=3 width=83) (actual time=0.033..0.033 rows=0 loops=1661)
               Index Cond: ((project_id = projects.id) AND (package_type = 5))
               Filter: ((version IS NOT NULL) AND (status = 0))
               Buffers: shared hit=4834 read=154
               I/O Timings: read=50.207
   ->  Index Scan using index_packages_package_files_on_package_id_and_file_name on packages_package_files  (cost=0.56..3.13 rows=1 width=8) (actual time=0.266..0.508 rows=1 loops=2)
         Index Cond: ((package_id = packages_packages.id) AND ((file_name)::text = 'mypkg-0.1.tar.gz'::text))
         Filter: (file_sha256 = '\x66633964663031326136386538663436323834333631633962623137376331613561363336333134616663313532363033663732383665316666343533653033'::bytea)
         Buffers: shared hit=5 read=5
         I/O Timings: read=0.966
 Planning Time: 7.949 ms
 Execution Time: 60.838 ms
(45 rows)

📽 Screenshots (strongly suggested)

→ pip3 install --index-url http://root:$TOKEN@gdk.test:3001/api/v4/groups/167/-/packages/pypi/simple --no-deps my.pypi.package --trusted-host gdk.test
Looking in indexes: http://root:****@gdk.test:3001/api/v4/groups/167/-/packages/pypi/simple
Collecting my.pypi.package
  Downloading http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/3f37017232013c8ac80647f4ca34b8b726f6cba62d055cd747844ed95b3c65ff/my.pypi.package-0.0.1-py3-none-any.whl (1.6 kB)
Installing collected packages: my.pypi.package
Successfully installed my.pypi.package-0.0.1
→ curl --user root:$TOKEN "http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/simple/my.pypi.package"
        <!DOCTYPE html>
        <html>
          <head>
            <title>Links for pypi-package-1</title>
          </head>
          <body>
            <h1>Links for pypi-package-1</h1>
            <a href="http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/3f37017232013c8ac80647f4ca34b8b726f6cba62d055cd747844ed95b3c65ff/my.pypi.package-0.0.1-py3-none-any.whl#sha256=3f37017232013c8ac80647f4ca34b8b726f6cba62d055cd747844ed95b3c65ff" data-requires-python="&gt;=3.6">my.pypi.package-0.0.1-py3-none-any.whl</a><br><a href="http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2/my.pypi.package-0.0.1.tar.gz#sha256=5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2" data-requires-python="&gt;=3.6">my.pypi.package-0.0.1.tar.gz</a><br>
          </body>
        </html>
→ curl --user root:$TOKEN "http://gdk.test:3001/api/v4/groups/167/-/packages/pypi/files/5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2/my.pypi.package-0.0.1.tar.gz#sha256=5afa611b0bcd52b709ec052084e33a5517ffca96f7728ddd9f8866a30cdf76f2" >> pkg.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1163  100  1163    0     0   3313      0 --:--:-- --:--:-- --:--:--  3313

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Related to #225545 (closed)

Edited by Steve Abrams

Merge request reports