Draft: Query Performance Investigation packages_packages PackagesGroupFinder

Problem

Packages::GroupPackagesFinder fetches packages within a group. Packages belong to a project, so to find all of them, we need to find all projects within a group and it's subgroups, then get all packages within those projects, and filter based on params from there. The base query (no additional filters applied) looks like:

SELECT "packages_packages".*
FROM "packages_packages"
WHERE "packages_packages"."project_id" IN
    (SELECT "projects"."id"
     FROM "projects"
     WHERE "projects"."namespace_id" IN
         (WITH RECURSIVE "base_and_descendants" AS (
                                                      (SELECT "namespaces".*
                                                       FROM "namespaces"
                                                       WHERE "namespaces"."id" = 785414)
                                                    UNION
                                                      (SELECT "namespaces".*
                                                       FROM "namespaces",
                                                            "base_and_descendants"
                                                       WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id"
          FROM "base_and_descendants" AS "namespaces") )
  AND ("packages_packages"."package_type" != 4
       OR "packages_packages"."name" != 'NuGet.Temporary.Package')
ORDER BY created_at ASC;

Cold Cache Explain (11.9s): https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2261/commands/7009 Warm Cache Explain (1.6s): https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2261/commands/7011

The biggest performance factor is the index scan over the packages_packages table:

         ->  Index Scan using index_packages_packages_on_project_id_and_package_type on public.packages_packages  (cost=0.42..45566.12 rows=875845 width=82) (actual time=1.918..10852.814 rows=870508 loops=1)
               Filter: ((packages_packages.package_type <> 4) OR ((packages_packages.name)::text <> 'NuGet.Temporary.Package'::text))
               Rows Removed by Filter: 6012
               Buffers: shared hit=701841 read=15853 dirtied=476
               I/O Timings: read=9668.189

This type of performance pattern is common in package features:

  1. Get all packages belonging to a group
  2. Get all packages belonging to a group that a specific user has access to. This requires checking permissions on the project-level.

Other Potential problems

  • All packages exist in a single table regardless of type despite regularly being filtered by type
  • Filtering can vary depending on the query, so we have many indexes that are [project_id, some_field].