Skip to content

PyPI simple repository API PEP 503

Steve Abrams requested to merge 327595-pypi-pep503 into master

🔎 What does this MR do and why?

Users can upload Python packages to the package registry using PyPI. PyPI makes use of the Simple Repository API as defined by PEP 503.

Part of the specification is supporting an endpoint /simple which acts as an index for all packages within the repository. The endpoint is specified to render a plain HTML page with anchored links to each "project", which is what we call a single "package" in the GitLab package registry. For example, here is the simple index page for the public pypi registry: https://pypi.org/simple/.

This MR adds the missing /simple endpoints to both the group and project level PyPI API. This means that the project-level endpoint will return a list of all packages within the project, and the group-level endpoint will return a list of all packages within the group (all packages in all projects within the group and it's subgroups).

Database

The query in the finder does not change, but we now take a larger set of packages and iterate over them with EachBatch. The query analysis below is meant to show how EachBatch performs here and how the performance compares to the already existing queries used by this finder.

The existing finder performance is not that great at the group-level, especially when dealing with cold-cache queries. This is a known issue for all package formats when querying at the group-level. The new queries introduced are all better performance than the existing queries and do not really modify them, but only add an EachBatch iteration.

For the each batch queries, I did not reset to a cold-cache state since these queries will be run in succession, so the first (min) query will be cold-cache, but the following queries should behave as warm-cache.

The project used for these queries has over 10000 PyPI packages, which is at the top of the range for projects with PyPI packages.

Setup in postgres.ai

-- add a user to a project with many pypi packages
exec insert into project_authorizations values (1, 18412832, 50);

Project-level query

Group-level query

🎞 Screenshots or screen recordings

Project level:

$ curl http://__token__:$TOKEN@gdk.test:3001/api/v4/projects/30/packages/pypi/simple
        <!DOCTYPE html>
        <html>
          <head>
            <title>Links for asdf</title>
          </head>
          <body>
            <h1>Links for asdf</h1>
            <a href="http://gdk.test:3001/api/v4/projects/30/packages/pypi/simple/my-pypi-package" data-requires-python="">my.pypi.package</a><br><a href="http://gdk.test:3001/api/v4/projects/30/packages/pypi/simple/totally-innocent-package" data-requires-python="3.8">totally_innocent_package</a><br>
          </body>
        </html>

Group level:

$ curl http://__token__:$TOKEN@gdk.test:3001/api/v4/groups/126/-/packages/pypi/simple
        <!DOCTYPE html>
        <html>
          <head>
            <title>Links for asdfasdf</title>
          </head>
          <body>
            <h1>Links for asdfasdf</h1>
            <a href="http://gdk.test:3001/api/v4/groups/126/-/packages/pypi/simple/my-pypi-package" data-requires-python="">my.pypi.package</a><br><a href="http://gdk.test:3001/api/v4/groups/126/-/packages/pypi/simple/totally-innocent-package" data-requires-python="3.8">totally_innocent_package</a><br>
          </body>
        </html>

📝 How to set up and validate locally

  1. Create a group and a project in that group (or choose one that already exists), noting the project_id and group_id
  2. Follow https://docs.gitlab.com/ee/user/packages/pypi_repository/ to create and publish 1 or more packages to the project you are using
  3. Curl the new project endpoint with a personal access token
    $ curl http://__token__:<personal_access_token>@gdk.test:3000/api/v4/projects/<project_id>/packages/pypi/simple
  4. Curl the new group endpoint with a personal access token:
    $ curl http://__token__:<personal_access_token>@gdk.test:3000/api/v4/groups/<group_id>/-/packages/pypi/simple
  5. The package you published should be listed in both responses. You could also alternatively just visit the URL in your browser if you are logged into GitLab, for example: http://gdk.test:3000/api/v4/projects/<project_id>/packages/pypi/simple.

🛃 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #327595 (closed)

Edited by Steve Abrams

Merge request reports