Skip to content

Optimize the package finder helper when dealing with deploy tokens [RUN ALL RSPEC] [RUN AS-IF-FOSS]

🍌 Context

The maven package finder is heavily used by the maven package registry.

We noticed that at the group level, some requests were having horrible response times.

Our analysis found out an issue when the user uses deploy tokens to authenticate with the maven package registry.

Basically, what happens is that Active Record seems to have a hard time to merge two scopes with identical conditions and we end up with a SQL query similar to this one:

WITH "maven_metadata_by_path" AS (
        SELECT "packages_maven_metadata"."id",
               "packages_maven_metadata"."package_id"
          FROM "packages_maven_metadata"
         WHERE "packages_maven_metadata"."path" = 'gl/pru/maven_pkg_01_02_03_04_05_06/0.7.4'
       ) SELECT "packages_packages".*
  FROM "packages_packages"
 INNER JOIN maven_metadata_by_path
    ON maven_metadata_by_path.package_id=packages_packages.id
 WHERE "packages_packages"."project_id" IN (
        SELECT "projects"."id"
          FROM "projects"
         WHERE "projects"."namespace_id" IN (
                SELECT "id"
                  FROM (
                        SELECT "namespaces".*
                          FROM "namespaces"
                         INNER JOIN (
                                SELECT "id",
                                       "depth"
                                  FROM (
                                        WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT DISTINCT "namespaces".*,
                                               ROW_NUMBER() OVER () AS depth
                                          FROM "base_and_descendants" AS "namespaces"
                                       ) AS "namespaces"
                                 WHERE "namespaces"."type" = 'Group'
                               ) namespaces_join_table
                            ON namespaces_join_table.id = namespaces.id
                         WHERE "namespaces"."type" = 'Group'
                         ORDER BY "namespaces_join_table"."depth" ASC
                       ) AS "namespaces"
                 WHERE "namespaces"."type" = 'Group'
               )
           AND "projects"."namespace_id" IN (
                SELECT id
                  FROM (
                        SELECT "namespaces".*
                          FROM "namespaces"
                         INNER JOIN (
                                SELECT "id",
                                       "depth"
                                  FROM (
                                        WITH RECURSIVE "base_and_descendants" AS ((SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."type" = 'Group' AND "namespaces"."id" = 252) UNION (SELECT "namespaces".* FROM "namespaces", "base_and_descendants" WHERE "namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT DISTINCT "namespaces".*,
                                               ROW_NUMBER() OVER () AS depth
                                          FROM "base_and_descendants" AS "namespaces"
                                       ) AS "namespaces"
                                 WHERE "namespaces"."type" = 'Group'
                               ) namespaces_join_table
                            ON namespaces_join_table.id = namespaces.id
                         WHERE "namespaces"."type" = 'Group'
                         ORDER BY "namespaces_join_table"."depth" ASC
                       ) AS "namespaces"
                 WHERE "namespaces"."type" = 'Group'
               )
       )
 ORDER BY "packages_packages"."id" DESC
 LIMIT 1

If you pay close attention, we have a duplicated condition.

This MR is part of the improvements described in issue #325869 (closed).

🔬 What does this MR do?

  • Simplify some scopes by using Namespace#all_projects.
    • This does not change the generated SQL query. It's merely to have a more readable code.
  • When a deploy token is used, update the function that returns the available projects to the user by using DeployToken#accessible_projects.
    • The current code checks the projects using a minimum role but this logic can't be applied to deploy tokens as they are linked to groups directly. There is no notion of minimum role.
    • Actually, we're using the exact same function that Project.public_or_visible_to_user uses

The maven package registry being one of the most used regisitries, this change is behind a feature flag to have an additional safety net. Here is the tracking issue: #326808 (closed)

🖼 Screenshots (strongly suggested)

n / a

📏 Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

💽 Database review

For the explain plans of this MR, we're going to that these feature flag are enabled:

  • use_distinct_for_all_object_hierarchy
  • maven_metadata_by_path_with_optimization_fence
  • maven_packages_group_level_improvements

Those are past improvements that have consequences to the generate SQL queries by the maven package finder. They are all currently enabled on gitlab.com.

See the notes for the database review:

Edited by David Fernandez

Merge request reports