Move GKG queries to Workhorse
## Problem Every Knowledge Graph query blocks a Puma worker for the full round trip. Authentication, JWT generation, the bidirectional gRPC stream to GKG, the ClickHouse query, the redaction exchange against Postgres, and the final result delivery all run synchronously inside one Ruby thread. Nothing yields control back to the web server until the entire sequence finishes. A simple graph query can take hundreds of milliseconds. A complex traversal can take several seconds. During that window, the Puma thread is unavailable for anything else. We discovered this during [load testing in staging](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/320). Under moderate concurrent load, Puma saturation climbed quickly. At higher request rates the system stopped recovering: gRPC deadline timeouts, 502s from overwhelmed workers, and ClickHouse running out of memory. The bottleneck was not ClickHouse or GKG. It was Puma threads held open by long-lived gRPC streams. The issue is structural. Ruby/Puma is synchronous. Each worker handles one request at a time. The Knowledge Graph query pattern holds a bidirectional stream open for seconds while waiting on ClickHouse, then makes a callback to Postgres for authorization. That does not fit the Puma model. Workhorse, written in Go, can hold thousands of concurrent connections in goroutines. It already solves this same problem for Gitaly file streaming, archive downloads, and blob serving. ### Current architecture The request enters Workhorse, gets forwarded to Rails, and Rails handles the rest. Auth, gRPC client setup, the bidirectional stream, the redaction exchange, the response. The Puma worker is occupied the entire time. ```plantuml @startuml current_flow skinparam backgroundColor #FFFFFF skinparam sequenceArrowThickness 2 skinparam sequenceParticipantBorderColor #333333 skinparam sequenceLifeLineBorderColor #999999 skinparam noteBorderColor #CCCCCC skinparam noteBackgroundColor #FFFFDD title Current flow: Rails handles everything actor Client participant "Workhorse\n(Go)" as WH participant "Puma Worker\n(Ruby)" as Puma participant "GKG Server\n(Rust)" as GKG database "ClickHouse" as CH database "PostgreSQL" as PG Client -> WH: POST /api/v4/orbit/query WH -> Puma: Forward request activate Puma #FF6666 note over Puma: Puma thread blocked\nfor entire duration\n(up to several seconds) Puma -> Puma: Authenticate user Puma -> Puma: Generate JWT\n(traversal_ids, org_id) Puma -> GKG: Open bidi gRPC stream\n(ExecuteQuery) activate GKG GKG -> CH: Execute ClickHouse query CH --> GKG: Result set with resource IDs GKG --> Puma: RedactionRequired\n(resource IDs grouped by type) deactivate GKG Puma -> PG: Batch load resources\n+ Ability.allowed? checks PG --> Puma: Authorization decisions Puma -> GKG: RedactionResponse\n(authorized ID map) activate GKG GKG -> GKG: Filter unauthorized rows GKG --> Puma: ExecuteQueryResult\n(final JSON) deactivate GKG Puma --> WH: HTTP 200 + JSON body deactivate Puma WH --> Client: Response note over Puma: Worker freed after\nfull round trip completes @enduml ``` ### Why this matters at scale GitLab.com runs a fixed number of Puma workers, several per pod across hundreds of pods. Each worker handles one request at a time. When Knowledge Graph queries hold workers for seconds, those workers are unavailable for everything else on Rails: merge request pages, API calls, CI status checks. A sustained burst of graph queries can starve unrelated features of capacity. This is not a theoretical concern. The Gitaly team built Workhorse to solve this class of problem for git operations. CI archive downloads, blob streaming, and diff generation all use the SendData/Injecter pattern to free Puma workers immediately after authentication. ## Proposed solution Move the bidirectional gRPC stream from Puma to Workhorse using the same SendData/Injecter pattern that Gitaly operations already use. Rails stays responsible for authentication and authorization, since it is the only component with access to the full DeclarativePolicy engine and Postgres. But the long-running gRPC stream, the ClickHouse wait, and the result delivery all happen in Workhorse goroutines instead. The Puma worker gets occupied twice, briefly, instead of once for the full duration: 1. Initial request (milliseconds): Authenticate the user, generate the JWT with traversal IDs, return a `Gitlab-Workhorse-Send-Data: orbit-query:{params}` header with an empty body. Puma is freed. 2. Redaction callback (milliseconds): When GKG sends a `RedactionRequired` message mid-stream, Workhorse makes an HTTP POST to `POST /api/v4/internal/orbit/redaction`. Puma runs the batch authorization check against Postgres and returns. Puma is freed again. Everything else happens in Go. This architecture also further decouples GKG from Rails. Today, the Rails gRPC client is tightly bound to the GKG protobuf contract and stream lifecycle. Moving the gRPC client to Workhorse means Rails only needs to know how to generate a JWT and where GKG lives. It does not need to manage stream state, handle protobuf deserialization, or coordinate the redaction exchange inline. If the GKG protocol evolves, the changes are isolated to the Workhorse Go package rather than spread across Ruby service objects. ```plantuml @startuml proposed_flow skinparam backgroundColor #FFFFFF skinparam sequenceArrowThickness 2 skinparam sequenceParticipantBorderColor #333333 skinparam sequenceLifeLineBorderColor #999999 skinparam noteBorderColor #CCCCCC skinparam noteBackgroundColor #FFFFDD title Proposed flow: Workhorse handles gRPC stream actor Client participant "Workhorse\n(Go)" as WH participant "Puma Worker\n(Ruby)" as Puma participant "GKG Server\n(Rust)" as GKG database "ClickHouse" as CH database "PostgreSQL" as PG Client -> WH: POST /api/v4/orbit/query WH -> Puma: Forward request activate Puma #90EE90 note over Puma: Puma occupied\nfor milliseconds only Puma -> Puma: Authenticate user Puma -> Puma: Generate JWT\n(traversal_ids, org_id) Puma --> WH: SendData header:\norbit-query:{GkgServer, JWT, Query} deactivate Puma note over Puma: Puma freed immediately activate WH #4169E1 WH -> GKG: Open bidi gRPC stream\n(ExecuteQuery + JWT in metadata) activate GKG GKG -> CH: Execute ClickHouse query CH --> GKG: Result set with resource IDs GKG --> WH: RedactionRequired\n(resource IDs grouped by type) deactivate GKG WH -> Puma: POST /internal/orbit/redaction\n(user_id, resources) activate Puma #90EE90 note over Puma: Puma occupied\nfor milliseconds only Puma -> PG: Batch Ability.allowed? checks PG --> Puma: Authorization decisions Puma --> WH: JSON {authorizations: [...]} deactivate Puma note over Puma: Puma freed again WH -> GKG: RedactionResponse\n(authorized ID map) activate GKG GKG -> GKG: Filter unauthorized rows GKG --> WH: ExecuteQueryResult\n(final JSON) deactivate GKG WH --> Client: HTTP 200 + JSON body deactivate WH @enduml ``` ### Why SendData/Injecter and not GOB proxy Workhorse has two patterns for offloading work from Rails. The GOB proxy (`workhorse/internal/gob/proxy.go`) is a full HTTP reverse proxy. Rails authorizes the request and returns an upstream URL. Workhorse proxies everything to the upstream service and streams the response back. This works when the upstream handles everything independently after auth. SendData/Injecter (`workhorse/internal/senddata/`) works differently. Rails returns a special header with serialized parameters. Workhorse intercepts the response before it reaches the client, unpacks the parameters, and runs custom logic: gRPC calls, streaming, whatever the injecter does. Gitaly operations like `git-changed-paths`, `git-list-blobs`, and archive downloads all use this pattern. The GOB proxy does not work for Knowledge Graph queries because the GKG bidirectional stream requires a mid-stream callback to Rails for redaction. A reverse proxy cannot inject messages into a gRPC bidi stream. SendData/Injecter gives Workhorse full control over the gRPC connection while still calling back to Rails when GKG requests authorization checks. ### Why redaction cannot leave Rails The redaction layer calls `Ability.allowed?`, which evaluates the full DeclarativePolicy engine. That depends on ActiveRecord model loading, Postgres, SAML provider state, IP restrictions, confidentiality checks, custom roles, and the Ruby policy DSL. Reimplementing this in Go would mean duplicating GitLab's entire authorization model. Until a dedicated authorization service (like GLAZ/Zanzibar) exists, Rails is the single source of truth for Layer 3 access checks. ## Related issues and MRs - knowledge-graph#320: Load testing overview (staging results that surfaced this problem) - knowledge-graph#348: Redaction scalability series (parent issue) - knowledge-graph!730: ADR 008, Workhorse query acceleration design document - knowledge-graph!732: Go protobuf module for GKG service - !229374: RedactionService eager-load associations - !229378: RedactionService batch cache pre-seeding - !229394: Workhorse orbit query acceleration (implementation)
issue