Move GKG queries to Workhorse
## Problem
Every Knowledge Graph query blocks a Puma worker for the full round trip. Authentication, JWT generation, the bidirectional gRPC stream to GKG, the ClickHouse query, the redaction exchange against Postgres, and the final result delivery all run synchronously inside one Ruby thread. Nothing yields control back to the web server until the entire sequence finishes.
A simple graph query can take hundreds of milliseconds. A complex traversal can take several seconds. During that window, the Puma thread is unavailable for anything else.
We discovered this during [load testing in staging](https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/320). Under moderate concurrent load, Puma saturation climbed quickly. At higher request rates the system stopped recovering: gRPC deadline timeouts, 502s from overwhelmed workers, and ClickHouse running out of memory. The bottleneck was not ClickHouse or GKG. It was Puma threads held open by long-lived gRPC streams.
The issue is structural. Ruby/Puma is synchronous. Each worker handles one request at a time. The Knowledge Graph query pattern holds a bidirectional stream open for seconds while waiting on ClickHouse, then makes a callback to Postgres for authorization. That does not fit the Puma model. Workhorse, written in Go, can hold thousands of concurrent connections in goroutines. It already solves this same problem for Gitaly file streaming, archive downloads, and blob serving.
### Current architecture
The request enters Workhorse, gets forwarded to Rails, and Rails handles the rest. Auth, gRPC client setup, the bidirectional stream, the redaction exchange, the response. The Puma worker is occupied the entire time.
```plantuml
@startuml current_flow
skinparam backgroundColor #FFFFFF
skinparam sequenceArrowThickness 2
skinparam sequenceParticipantBorderColor #333333
skinparam sequenceLifeLineBorderColor #999999
skinparam noteBorderColor #CCCCCC
skinparam noteBackgroundColor #FFFFDD
title Current flow: Rails handles everything
actor Client
participant "Workhorse\n(Go)" as WH
participant "Puma Worker\n(Ruby)" as Puma
participant "GKG Server\n(Rust)" as GKG
database "ClickHouse" as CH
database "PostgreSQL" as PG
Client -> WH: POST /api/v4/orbit/query
WH -> Puma: Forward request
activate Puma #FF6666
note over Puma: Puma thread blocked\nfor entire duration\n(up to several seconds)
Puma -> Puma: Authenticate user
Puma -> Puma: Generate JWT\n(traversal_ids, org_id)
Puma -> GKG: Open bidi gRPC stream\n(ExecuteQuery)
activate GKG
GKG -> CH: Execute ClickHouse query
CH --> GKG: Result set with resource IDs
GKG --> Puma: RedactionRequired\n(resource IDs grouped by type)
deactivate GKG
Puma -> PG: Batch load resources\n+ Ability.allowed? checks
PG --> Puma: Authorization decisions
Puma -> GKG: RedactionResponse\n(authorized ID map)
activate GKG
GKG -> GKG: Filter unauthorized rows
GKG --> Puma: ExecuteQueryResult\n(final JSON)
deactivate GKG
Puma --> WH: HTTP 200 + JSON body
deactivate Puma
WH --> Client: Response
note over Puma: Worker freed after\nfull round trip completes
@enduml
```
### Why this matters at scale
GitLab.com runs a fixed number of Puma workers, several per pod across hundreds of pods. Each worker handles one request at a time. When Knowledge Graph queries hold workers for seconds, those workers are unavailable for everything else on Rails: merge request pages, API calls, CI status checks. A sustained burst of graph queries can starve unrelated features of capacity.
This is not a theoretical concern. The Gitaly team built Workhorse to solve this class of problem for git operations. CI archive downloads, blob streaming, and diff generation all use the SendData/Injecter pattern to free Puma workers immediately after authentication.
## Proposed solution
Move the bidirectional gRPC stream from Puma to Workhorse using the same SendData/Injecter pattern that Gitaly operations already use. Rails stays responsible for authentication and authorization, since it is the only component with access to the full DeclarativePolicy engine and Postgres. But the long-running gRPC stream, the ClickHouse wait, and the result delivery all happen in Workhorse goroutines instead.
The Puma worker gets occupied twice, briefly, instead of once for the full duration:
1. Initial request (milliseconds): Authenticate the user, generate the JWT with traversal IDs, return a `Gitlab-Workhorse-Send-Data: orbit-query:{params}` header with an empty body. Puma is freed.
2. Redaction callback (milliseconds): When GKG sends a `RedactionRequired` message mid-stream, Workhorse makes an HTTP POST to `POST /api/v4/internal/orbit/redaction`. Puma runs the batch authorization check against Postgres and returns. Puma is freed again.
Everything else happens in Go.
This architecture also further decouples GKG from Rails. Today, the Rails gRPC client is tightly bound to the GKG protobuf contract and stream lifecycle. Moving the gRPC client to Workhorse means Rails only needs to know how to generate a JWT and where GKG lives. It does not need to manage stream state, handle protobuf deserialization, or coordinate the redaction exchange inline. If the GKG protocol evolves, the changes are isolated to the Workhorse Go package rather than spread across Ruby service objects.
```plantuml
@startuml proposed_flow
skinparam backgroundColor #FFFFFF
skinparam sequenceArrowThickness 2
skinparam sequenceParticipantBorderColor #333333
skinparam sequenceLifeLineBorderColor #999999
skinparam noteBorderColor #CCCCCC
skinparam noteBackgroundColor #FFFFDD
title Proposed flow: Workhorse handles gRPC stream
actor Client
participant "Workhorse\n(Go)" as WH
participant "Puma Worker\n(Ruby)" as Puma
participant "GKG Server\n(Rust)" as GKG
database "ClickHouse" as CH
database "PostgreSQL" as PG
Client -> WH: POST /api/v4/orbit/query
WH -> Puma: Forward request
activate Puma #90EE90
note over Puma: Puma occupied\nfor milliseconds only
Puma -> Puma: Authenticate user
Puma -> Puma: Generate JWT\n(traversal_ids, org_id)
Puma --> WH: SendData header:\norbit-query:{GkgServer, JWT, Query}
deactivate Puma
note over Puma: Puma freed immediately
activate WH #4169E1
WH -> GKG: Open bidi gRPC stream\n(ExecuteQuery + JWT in metadata)
activate GKG
GKG -> CH: Execute ClickHouse query
CH --> GKG: Result set with resource IDs
GKG --> WH: RedactionRequired\n(resource IDs grouped by type)
deactivate GKG
WH -> Puma: POST /internal/orbit/redaction\n(user_id, resources)
activate Puma #90EE90
note over Puma: Puma occupied\nfor milliseconds only
Puma -> PG: Batch Ability.allowed? checks
PG --> Puma: Authorization decisions
Puma --> WH: JSON {authorizations: [...]}
deactivate Puma
note over Puma: Puma freed again
WH -> GKG: RedactionResponse\n(authorized ID map)
activate GKG
GKG -> GKG: Filter unauthorized rows
GKG --> WH: ExecuteQueryResult\n(final JSON)
deactivate GKG
WH --> Client: HTTP 200 + JSON body
deactivate WH
@enduml
```
### Why SendData/Injecter and not GOB proxy
Workhorse has two patterns for offloading work from Rails.
The GOB proxy (`workhorse/internal/gob/proxy.go`) is a full HTTP reverse proxy. Rails authorizes the request and returns an upstream URL. Workhorse proxies everything to the upstream service and streams the response back. This works when the upstream handles everything independently after auth.
SendData/Injecter (`workhorse/internal/senddata/`) works differently. Rails returns a special header with serialized parameters. Workhorse intercepts the response before it reaches the client, unpacks the parameters, and runs custom logic: gRPC calls, streaming, whatever the injecter does. Gitaly operations like `git-changed-paths`, `git-list-blobs`, and archive downloads all use this pattern.
The GOB proxy does not work for Knowledge Graph queries because the GKG bidirectional stream requires a mid-stream callback to Rails for redaction. A reverse proxy cannot inject messages into a gRPC bidi stream. SendData/Injecter gives Workhorse full control over the gRPC connection while still calling back to Rails when GKG requests authorization checks.
### Why redaction cannot leave Rails
The redaction layer calls `Ability.allowed?`, which evaluates the full DeclarativePolicy engine. That depends on ActiveRecord model loading, Postgres, SAML provider state, IP restrictions, confidentiality checks, custom roles, and the Ruby policy DSL. Reimplementing this in Go would mean duplicating GitLab's entire authorization model. Until a dedicated authorization service (like GLAZ/Zanzibar) exists, Rails is the single source of truth for Layer 3 access checks.
## Related issues and MRs
- knowledge-graph#320: Load testing overview (staging results that surfaced this problem)
- knowledge-graph#348: Redaction scalability series (parent issue)
- knowledge-graph!730: ADR 008, Workhorse query acceleration design document
- knowledge-graph!732: Go protobuf module for GKG service
- !229374: RedactionService eager-load associations
- !229378: RedactionService batch cache pre-seeding
- !229394: Workhorse orbit query acceleration (implementation)
issue