Add GRPC endpoint with auth that can merge results from multiple zoekt nodes
Problem to solve
Currently, we have to use threads to combine results from multiple nodes. We also have gitlab-zoekt-indexer!310 (merged) to move the existing logic to the indexer, but I think our long-term solution should be an gRPC endpoint added to the webserver binary.
Proposal
If we use a GRPC endpoint, that allows us to stream results from multiple nodes at the same time and stopping it as soon as we reach 5,000 results (our limit). Also, this will open a possibility to stream results to GitLab (or other clients).
sequenceDiagram
participant Client as Client
participant Zoekt1 as Zoekt Node 1
participant Zoekt2 as Zoekt Node 2
participant Zoekt3 as Zoekt Node 3
Client->>Zoekt1: Send search request via gRPC
Zoekt1-->>Zoekt1: Process local search
Zoekt1->>Zoekt2: Send search request to Zoekt Node 2
Zoekt1->>Zoekt3: Send search request to Zoekt Node 3
Zoekt2-->>Zoekt1: Return partial search results
Zoekt3-->>Zoekt1: Return partial search results
Zoekt1-->>Client: Aggregate and return results
Context
Other important items:
- The new endpoint should be a part of
gitlab-zoekt-webserver
, not as part of the indexer since we want to split reads and writes to different processes. That was the reason behind gitlab-zoekt-indexer!218 (merged) - We should add JWT authentication to the new GRPC endpoint
- In this comment, the Zoekt maintainer shares some ideas how to aggregate shards efficiently
Edited by Dmitry Gruzd