Draft: Query Performance Investigation packages_packages PackagesGroupFinder
Problem
Packages::GroupPackagesFinder
fetches packages within a group. Packages belong to a project, so to find all of them, we need to find all projects within a group and it's subgroups, then get all packages within those projects, and filter based on params from there. The base query (no additional filters applied) looks like:
SELECT "packages_packages".*
FROM "packages_packages"
WHERE "packages_packages"."project_id" IN
(SELECT "projects"."id"
FROM "projects"
WHERE "projects"."namespace_id" IN
(WITH RECURSIVE "base_and_descendants" AS (
(SELECT "namespaces".*
FROM "namespaces"
WHERE "namespaces"."id" = 785414)
UNION
(SELECT "namespaces".*
FROM "namespaces",
"base_and_descendants"
WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id"
FROM "base_and_descendants" AS "namespaces") )
AND ("packages_packages"."package_type" != 4
OR "packages_packages"."name" != 'NuGet.Temporary.Package')
ORDER BY created_at ASC;
Cold Cache Explain (11.9s): https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2261/commands/7009 Warm Cache Explain (1.6s): https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2261/commands/7011
The biggest performance factor is the index scan over the packages_packages
table:
-> Index Scan using index_packages_packages_on_project_id_and_package_type on public.packages_packages (cost=0.42..45566.12 rows=875845 width=82) (actual time=1.918..10852.814 rows=870508 loops=1)
Filter: ((packages_packages.package_type <> 4) OR ((packages_packages.name)::text <> 'NuGet.Temporary.Package'::text))
Rows Removed by Filter: 6012
Buffers: shared hit=701841 read=15853 dirtied=476
I/O Timings: read=9668.189
This type of performance pattern is common in package features:
- Get all packages belonging to a group
- Get all packages belonging to a group that a specific user has access to. This requires checking permissions on the project-level.
Other Potential problems
- All packages exist in a single table regardless of type despite regularly being filtered by type
- Filtering can vary depending on the query, so we have many indexes that are
[project_id, some_field]
.