Skip to content

Improvements to the Maven API files endpoints [RUN ALL RSPEC] [RUN AS-IF-FOSS]

MR 57600 🎉

🐿 Context

The GitLab maven package registry exposes a number of endpoints at different levels:

  • Project, under ``
  • Group, under ``
  • Instance, under ``

In this MR, we're going to tackle the performance of the Group level file endpoint. The endpoint logic is quite simple:

  • locate the package within the group
  • check the permissions
  • locate the package file requested
  • return it
    • Well actually, on gitlab.com a redirect to the object storage location is returned.

Here is the p95 for the response time. Well... I think it's obvious, this chart does not bring joy.

Looking at a particularly long request, we see this:

json.duration_s: 15.407
json.db_duration_s: 15.35

😱

So clearly, we have an issue with the queries triggered with that endpoint.

We conducted a small analysis on this part. We identified 3 possible improvements:

  1. Avoid accessing the projects table in this scope
  2. Avoid accessing the packages_packages table 3 times
    • That's a principle from the "Rails performance" course: Don't access a table more than once per web request
  3. Implement #287638 (closed) which brings an additional condition when selecting projects.

The above is exactly #325869 (closed) and this MR will implement these 3 changes.

The maven package registry being heavily used, those changes will be gated behind a feature flag.

🔬 What does this MR do?

Improvement 1: avoid accessing the projects table in this scope

  • Put the #any? call behind the feature flag
  • Add specs for Packages::Packages.for_projects

Improvement 2: Avoid accessing the packages_packages table 3 times

  • These access were done to check if the finder was dealing with versionless packages (a package with a nil version). When it was the case, the finder had to override the default order (most recent package created first) to a specific order (package with most recent package file created first)
  • The client of that finder can already tell if we're dealing with versionless packages or not. As such, the finder only needs to implement an option (order_by_package_file) which will use the custom ordering or not.
  • Updated the related specs
  • Updated maven_packages.rb to detect versionless packages and use the custom order in that case.
    • Note that the finder is used on the file endpoint on all levels but the change will only happen when the finder works with a group (Group level)
  • Updated the maven_packages.rb related specs
    • Add more examples for the group level context to verify that the versionless package detection was working as expected

Improvement 3: Implement #287638 (closed)

  • The package finder helper was simply used for this. It's usage is gated behind the feature flag
  • Update the related specs

📸 Screenshots (strongly suggested)

I can't really benchmark the endpoint to show a reduction in response time. This is due to the conditions needed to have slow requests on the Group level: we need many, many, many subgroups and sub projects with many, many packages.

Instead, I will dump the requests triggered by the endpoint and hopefully, we should have less queries with this MR.

Before this MR

User Load (0.6ms)  SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1 

Group Load (0.6ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252 LIMIT 1

IpRestriction Load (0.3ms)  SELECT "ip_restrictions".* FROM "ip_restrictions" WHERE "ip_restrictions"."group_id" = 252  

Project Exists? (2.4ms)  SELECT 1 AS one FROM "projects" WHERE "projects"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT id FROM "base_and_descendants" AS "namespaces") AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 1 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (0,10,20)) LIMIT 1 

Packages::Package Exists? (2.4ms)  SELECT 1 AS one FROM "packages_packages" INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id" WHERE "packages_packages"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT id FROM "base_and_descendants" AS "namespaces") AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 1 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (0,10,20))) AND "packages_maven_metadata"."path" = 'gl/pru/maven_pkg_01/9.2.1' LIMIT 1

Packages::Package Load (2.1ms)  SELECT "packages_packages".* FROM "packages_packages" INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id" WHERE "packages_packages"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT id FROM "base_and_descendants" AS "namespaces") AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 1 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (0,10,20))) AND "packages_maven_metadata"."path" = 'gl/pru/maven_pkg_01/9.2.1' ORDER BY "packages_packages"."id" ASC LIMIT 1 

Packages::Package Load (2.3ms)  SELECT "packages_packages".* FROM "packages_packages" INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id" WHERE "packages_packages"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT id FROM "base_and_descendants" AS "namespaces") AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 1 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (0,10,20))) AND "packages_maven_metadata"."path" = 'gl/pru/maven_pkg_01/9.2.1' ORDER BY "packages_packages"."id" DESC LIMIT 1

Project Load (0.8ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" = 289 LIMIT 1 

ProjectFeature Load (0.5ms)  SELECT "project_features".* FROM "project_features" WHERE "project_features"."project_id" = 289 LIMIT 1 

Group Load (0.5ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 253 AND "namespaces"."type" = 'Group' LIMIT 1 

Namespace Load (0.5ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 252 LIMIT 1 

Group Load (1.3ms)  WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 253)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT "namespaces".* FROM "base_and_ancestors" AS "namespaces" WHERE "namespaces"."parent_id" IS NULL LIMIT 1 

SamlProvider Load (0.3ms)  SELECT "saml_providers".* FROM "saml_providers" WHERE "saml_providers"."group_id" = 252 LIMIT 1 

Packages::PackageFile Load (0.5ms)  SELECT "packages_package_files".* FROM "packages_package_files" WHERE "packages_package_files"."package_id" = 488 AND "packages_package_files"."file_name" = 'maven_pkg_01-9.2.1.pom' ORDER BY "packages_package_files"."id" DESC LIMIT 1 

14 SQL queries

With this MR

User Load (2.3ms)  SELECT "users".* FROM "users" WHERE "users"."id" = 1 ORDER BY "users"."id" ASC LIMIT 1 

Group Load (2.0ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252 LIMIT 1

IpRestriction Load (0.4ms)  SELECT "ip_restrictions".* FROM "ip_restrictions" WHERE "ip_restrictions"."group_id" = 252 

Packages::Package Load (5.9ms)  SELECT "packages_packages".* FROM "packages_packages" INNER JOIN "packages_maven_metadata" ON "packages_maven_metadata"."package_id" = "packages_packages"."id" WHERE "packages_packages"."project_id" IN (SELECT "projects"."id" FROM "projects" WHERE "projects"."namespace_id" IN (WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id" FROM "base_and_descendants" AS "namespaces") AND (EXISTS (SELECT 1 FROM "project_authorizations" WHERE "project_authorizations"."user_id" = 1 AND (project_authorizations.project_id = projects.id)) OR projects.visibility_level IN (0,10,20))) AND "packages_maven_metadata"."path" = 'gl/pru/maven_pkg_01/9.2.1' ORDER BY "packages_packages"."id" DESC LIMIT 1 

Project Load (0.9ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" = 289 LIMIT 1 

ProjectFeature Load (0.5ms)  SELECT "project_features".* FROM "project_features" WHERE "project_features"."project_id" = 289 LIMIT 1 

Group Load (0.5ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 253 AND "namespaces"."type" = 'Group' LIMIT 1 

Namespace Load (0.7ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 252 LIMIT 1 

Group Load (1.0ms)  WITH RECURSIVE "base_and_ancestors" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 253)
UNION
(SELECT "namespaces".* FROM "namespaces", "base_and_ancestors" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = "base_and_ancestors"."parent_id")) SELECT "namespaces".* FROM "base_and_ancestors" AS "namespaces" WHERE "namespaces"."parent_id" IS NULL LIMIT 1

SamlProvider Load (0.4ms)  SELECT "saml_providers".* FROM "saml_providers" WHERE "saml_providers"."group_id" = 252 LIMIT 1 

Packages::PackageFile Load (0.9ms)  SELECT "packages_package_files".* FROM "packages_package_files" WHERE "packages_package_files"."package_id" = 488 AND "packages_package_files"."file_name" = 'maven_pkg_01-9.2.1.pom' ORDER BY "packages_package_files"."id" DESC LIMIT 1

11 SQL queries.

From the original 14 queries:

    • 2 queries on packages_packages table
    • 1 query for the Project exists?

🔩 Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

💿 Database review

The changes in this MR will:

  • remove 3 queries
  • update an existing query:
    • The queries and the explain plans for this existing query are here
Edited by David Fernandez

Merge request reports