PyPI simple repository API PEP 503
🔎 What does this MR do and why?
Users can upload Python packages to the package registry using PyPI. PyPI makes use of the Simple Repository API as defined by PEP 503.
Part of the specification is supporting an endpoint /simple
which acts as an index for all packages within the repository. The endpoint is specified to render a plain HTML page with anchored links to each "project", which is what we call a single "package" in the GitLab package registry. For example, here is the simple index page for the public pypi registry: https://pypi.org/simple/.
This MR adds the missing /simple
endpoints to both the group and project level PyPI API. This means that the project-level endpoint will return a list of all packages within the project, and the group-level endpoint will return a list of all packages within the group (all packages in all projects within the group and it's subgroups).
Database
The query in the finder does not change, but we now take a larger set of packages and iterate over them with EachBatch
. The query analysis below is meant to show how EachBatch
performs here and how the performance compares to the already existing queries used by this finder.
The existing finder performance is not that great at the group-level, especially when dealing with cold-cache queries. This is a known issue for all package formats when querying at the group-level. The new queries introduced are all better performance than the existing queries and do not really modify them, but only add an EachBatch
iteration.
For the each batch queries, I did not reset to a cold-cache state since these queries will be run in succession, so the first (min) query will be cold-cache, but the following queries should behave as warm-cache.
The project used for these queries has over 10000 PyPI packages, which is at the top of the range for projects with PyPI packages.
Setup in postgres.ai
-- add a user to a project with many pypi packages
exec insert into project_authorizations values (1, 18412832, 50);
Project-level query
- Existing Finder query
389ms
: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37474 - New Finder query with
each_batch
:- Get Min ID
15ms
: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37477 - Get top of batch ID
24ms
: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37481 - Select batch
3ms
: https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37482
- Get Min ID
Group-level query
- Existing Finder query
17s
: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37485 - New Finder query with
each_batch
:- Get min ID
356ms
: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37478 - Get top of batch ID
37ms
: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37479 - Select batch
43ms
: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10469/commands/37480
- Get min ID
🎞 Screenshots or screen recordings
Project level:
$ curl http://__token__:$TOKEN@gdk.test:3001/api/v4/projects/30/packages/pypi/simple
<!DOCTYPE html>
<html>
<head>
<title>Links for asdf</title>
</head>
<body>
<h1>Links for asdf</h1>
<a href="http://gdk.test:3001/api/v4/projects/30/packages/pypi/simple/my-pypi-package" data-requires-python="">my.pypi.package</a><br><a href="http://gdk.test:3001/api/v4/projects/30/packages/pypi/simple/totally-innocent-package" data-requires-python="3.8">totally_innocent_package</a><br>
</body>
</html>
Group level:
$ curl http://__token__:$TOKEN@gdk.test:3001/api/v4/groups/126/-/packages/pypi/simple
<!DOCTYPE html>
<html>
<head>
<title>Links for asdfasdf</title>
</head>
<body>
<h1>Links for asdfasdf</h1>
<a href="http://gdk.test:3001/api/v4/groups/126/-/packages/pypi/simple/my-pypi-package" data-requires-python="">my.pypi.package</a><br><a href="http://gdk.test:3001/api/v4/groups/126/-/packages/pypi/simple/totally-innocent-package" data-requires-python="3.8">totally_innocent_package</a><br>
</body>
</html>
📝 How to set up and validate locally
- Create a group and a project in that group (or choose one that already exists), noting the
project_id
andgroup_id
- Follow https://docs.gitlab.com/ee/user/packages/pypi_repository/ to create and publish 1 or more packages to the project you are using
- Curl the new project endpoint with a personal access token
$ curl http://__token__:<personal_access_token>@gdk.test:3000/api/v4/projects/<project_id>/packages/pypi/simple
- Curl the new group endpoint with a personal access token:
$ curl http://__token__:<personal_access_token>@gdk.test:3000/api/v4/groups/<group_id>/-/packages/pypi/simple
- The package you published should be listed in both responses. You could also alternatively just visit the URL in your browser if you are logged into GitLab, for example:
http://gdk.test:3000/api/v4/projects/<project_id>/packages/pypi/simple
.
🛃 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #327595 (closed)