Improve Redaction Service / Auth Scalability in Monolith Components (#348) · Issues · GitLab.org / orbit / GitLab Knowledge Graph

Improve Redaction Service / Auth Scalability in Monolith Components

### Problem The Knowledge Graph's `RedactionService` fires hundreds of SQL queries per batch authorization request due to per-resource authorization lookups (N+1 on `project_authorizations`, `members`, `saml_providers`, `License.current`). Additionally, GKG query execution blocks Puma worker threads for 500ms-8s per request while the bidirectional gRPC stream runs. ### Solution Five MRs across two repos addressing query performance and thread utilization. | MR | What it does | |---|---| | https://gitlab.com/gitlab-org/gitlab/-/merge_requests/229374+ | Eager-load associations in `PRELOAD_ASSOCIATIONS` (72% query reduction) | | https://gitlab.com/gitlab-org/gitlab/-/merge_requests/229378+ | Batch pre-seed authorization caches + `License.current` SafeRequestStore | | https://gitlab.com/gitlab-org/gitlab/-/merge_requests/229394+ | Move GKG queries from Puma to Workhorse via SendData/Injecter pattern | | https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/732+ | Go protobuf stubs for Workhorse integration | | https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/730+ | ADR 008: Workhorse query acceleration design doc | Related issues: - https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/349+ Move GKG queries to Workhorse (tracking issue) - https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/330+ Tune tonic gRPC HTTP/2 settings - https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/404+ Correlation ID tokio::spawn bug --- ### [!229374](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/229374): Eager-load associations Added missing associations to `PRELOAD_ASSOCIATIONS` so they are batch-loaded upfront in the existing `includes()` call. DeclarativePolicy conditions access `:organization`, `:saml_provider`, `:namespace`, `:assignees`, `:author`, and `:system_note_metadata` during `Ability.allowed?` checks, but these were not being eagerly loaded, causing N+1 lazy-load queries per resource. **Results (215 resources, cold caches):** | Metric | Before | After | Reduction | |---|---|---|---| | Mixed batch queries | 430 | 118 | 72% | | system_note_metadata | 50 (N+1) | 1 (batch) | 98% | | licenses | 217 | 55 | 75% | --- ### [!229378](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/229378): Batch authorization cache pre-seeding Before policy evaluation, `RedactionService` now collects all projects and groups from loaded resources in a single pass, then runs `ProjectPolicyPreloader` and `GroupPolicyPreloader` to batch-load `max_access_level` into `RequestStore`. This replaces per-resource N+1 queries on `project_authorizations` and `members` tables. Also wraps `License.current` in `SafeRequestStore` to avoid repeated JSON deserialization and AR object allocation on every call (each call parses JSON and allocates two License AR objects, ~800 unnecessary allocations per 50-project batch). **Postgres.ai results (session webui-150381, production clone):** | Query | Batch (1 query) | N+1 per call | N+1 x50 total | Improvement | |---|---|---|---|---| | `project_authorizations` | 23.9ms | 1.6ms | 95ms | 2.5x, 49 fewer round trips | | `saml_providers` | 0.07ms | 0.9ms | 26ms | 371x, 28 fewer round trips | | group member access | 85.7ms | 167ms | 3,340ms | 39x, 19 fewer round trips | | `License.current` | 0ms (cached) | 1.4ms | 122ms | 86 JSON parses eliminated | --- ### [!229394](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/229394): Workhorse query acceleration Moves the bidirectional gRPC stream between Rails and GKG from Puma worker threads to Workhorse goroutines using the SendData/Injecter pattern (same pattern used by Gitaly for blob serving). **How it works:** 1. Rails receives the query request, authenticates the user, builds a JWT 2. Rails responds with a `Gitlab-Workhorse-Send-Data` header containing the GKG connection details 3. Workhorse intercepts the header, opens a bidirectional gRPC stream to GKG 4. During the stream, GKG sends `RedactionRequired` messages back to Workhorse 5. Workhorse calls a Rails internal API (`POST /api/v4/internal/orbit/redaction`) to check permissions 6. Workhorse sends authorized resource IDs back to GKG, which filters results 7. Workhorse writes the final JSON response to the client **Impact:** Puma worker blocking drops from 500ms-8s (full query duration) to two short calls (~10-50ms each: initial auth + redaction callback). The query execution and streaming happen entirely in Workhorse goroutines. **Implementation:** - `workhorse/internal/orbit/`: 3 Go files (client.go, sendquery.go, redaction.go) + generated proto stubs - `ee/lib/gitlab/workhorse.rb`: `send_orbit_query` method - `ee/lib/api/internal/orbit.rb`: `POST /api/v4/internal/orbit/redaction` endpoint - `ee/app/controllers/dashboard/orbit/data.rb`: `ORBIT_WORKHORSE_ENABLED` env var gate - All 5 query types pass E2E: search, traversal, aggregation, path_finding, neighbors - MCP `query_graph` tool also routed through Workhorse via `McpId` param **Supporting MRs:** - https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/732+ Go protobuf stubs published as sub-module at `proto/go/` - https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/merge_requests/730+ ADR 008 design document with PlantUML diagrams

issue