Exclude group-covered projects from search authorization to reduce redundant payload
What does this MR do and why?
When a project is shared with a group via a group link, Search::ProjectsFinder can return that project as an individual entry even when the project's namespace is already covered by the user's group membership. This produces redundant project IDs in both Advanced Search (Elasticsearch) and Exact Code Search (Zoekt) queries.
For example, given this hierarchy where the user is a member of the top-level group:
top-level group (user has access)
├── sub group 1
│ └── sub group 2
│ └── project 1 ← shared with sub group 3
└── sub group 3
project 1 is already accessible via the user's top-level group membership, but because it is also shared with sub group 3, it gets returned as a separate project ID. For large namespaces (10k+ projects), this produces ~180 KB of redundant payload per access branch in Zoekt, potentially causing HTTP 413 errors when the query is sent to multiple nodes.
Approach
This MR adds redundant-project filtering into Search::ProjectsFinder#linked_through_group_projects. When the user has direct group memberships, a TrieNode is built from those memberships and used to check whether each shared project's namespace is already covered by the user's group hierarchy.
Projects whose namespaces are already covered are excluded — unless the group share link grants a higher access level than the user's existing group membership, in which case the project is still included (since the share link provides genuinely elevated permissions).
To avoid N+1 queries and handle large group memberships, group links are fetched in batches of 5,000 using raw SQL with ANY($1::bigint[]) binds rather than an ActiveRecord subquery. group_membership_source_ids was also changed from .select(:id) (AR subquery) to .pluck_primary_key (materialized array) to support this batching approach.
Example Zoekt query result
Before (project 1 enumerated redundantly):
OR(
traversal_ids: ^top_level-sub1-sub2-,
traversal_ids: ^top_level-sub3-,
repo_ids: [project 1, ...] # redundant — already covered by traversal_ids
)
After (only genuinely uncovered projects):
OR(
traversal_ids: ^top_level-sub1-sub2-,
traversal_ids: ^top_level-sub3-,
repo_ids: [only uncovered IDs] # project 1 excluded
)
Changes
-
- Rewrite
Search::ProjectsFinder#linked_through_group_projectsto use aTrieNodebuilt from the user's direct group memberships to skip projects whose namespaces are already covered
- Rewrite
- Introduce
BATCH_SIZE = 5000and batch group link fetching via raw SQL (find_links) to avoid large subqueries - Change
group_membership_source_idsfrom.select(:id)to.pluck_primary_keyto materialize IDs for batching - Preserve projects where the group share link grants elevated access above the user's existing group membership
- Add specs for the covered/elevated-access cases in
Search::ProjectsFinderandSearch::AuthorizationContext - Add integration tests in
Search::Zoekt::AccessBranchBuilderspec covering group-level access, partial group coverage, project-only access, and global search scenarios
How to set up and validate locally
Top Level Group *(Owner)*
└── 📁 My Product Group *(Owner)*
│ └── 📋 Rails Monolith
└── 📁 Developers Group *(Owner)*
└── 📋 Apple Project
- test with a non-admin user
- login with that user
- create the hierarchy above
- in Rails Monolith project, invite the
Developers GroupwithDEVELOPERaccess - open rails console
- look up the user, and run the projects finder
user = User.find XX ::Search::ProjectsFinder.new(user: user).execute - ensure that the project
Rails Monolithis not returned in the results - checkout
masterbranch and re-run the projects finder - the project
Rails Monolithwill be returned in the results
References
- #590935 (closed)
- Equivalent short-circuit in Elastic filters: https://gitlab.com/gitlab-org/gitlab/-/blob/08368451de720280dbcdcb2be7619c447264630a/ee/lib/search/elastic/filters.rb#L1507
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.