Package registry: Optimize query for selecting projects within a namespace
When interacting with an instance or a group endpoint, we use either Namespace#all_projects
or Namespace#self_and_descendants
to select the projects where we'll retrieve packages.
Both of these methods result in this subquery:
WHERE "projects"."namespace_id" IN (
SELECT "namespaces"."id"
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
AND (traversal_ids @> ('{<namespace_id>}'))
)
This query can be optimized by instead using the Project.by_any_overlap_with_traversal_ids
scope - see thread
Example usages
Namespace#all_projects
:
unless within_public_package_registry
return within_group.all_projects.public_or_visible_to_user(user, ::Gitlab::Access::REPORTER)
end
Project.in_namespace
:
::Project
.public_or_visible_to_user(user, Gitlab::Access::REPORTER)
.or(::Project.with_public_package_registry)
.in_namespace(within_group.self_and_descendants)
Both are used in projects_visible_to_reporters
in app/finders/concerns/packages/finder_helper.rb
Implementation Details
projects_visible_to_reporters
is used in:
-
packages_visible_to_user
Packages::Npm::PackagesForUserFinder
Packages::Pypi::PackagesFinder
-
Packages::GroupOrProjectPackageFinder
- used as base class by Maven, Npm, Nuget, and Pypi package finders
-
projects_visible_to_user_including_public_registries
Packages::Maven::PackageFinder
Packages::Nuget::PackageFinder
-
projects_visible_to_user
- Did not find any calls to this method
Given that this is used by the Maven, Npm, Nuget, and Pypi package formats, this warrants using a derisk feature flag.
Expected Outcome
- Significantly reduced query execution time
- Lower database load, especially for large group hierarchies
- Improved overall performance of the package registry when dealing with projects within namespaces
Testing
- Benchmark the new query against the old one with various group sizes
- Ensure no regression in functionality
- Verify improved performance with large datasets
Edited by Radamanthus Batnag