Designing a more efficient backend transport for Git fetch

streamrpc: a protocol for Gitaly RPC's that do bulk data transfers

(Happy to hear suggestions for a better name than "streamrpc".)

Features

  • a more efficient transport for git fetch-related RPC's
  • co-exists with gRPC, not a replacement
  • RPC's we want to replace: PostUploadPack, PackObjectsHook, SSHUploadPack
  • can be rolled out / enabled across all GitLab installations without administrator action (re-uses existing network ports, authentication tokens, TLS certificates)
  • works with all Gitaly connection types (unix, tcp, tls)
  • works with Praefect
  • re-uses existing Gitaly authentication, logging and instrumentation middleware

Caveats

  • not compatible with gRPC load balancers, so this assumes Praefect uses a TCP load balancer (like we do on gitlab.com)

High level design

  1. Every streamrpc call establishes a new network connection: no connection reuse
  2. The server (Gitaly/Praefect) uses listener multiplexing to accept streamrpc connections
  3. Use rpctest/grpc_adapter to exchange request metadata at the start of the request (method name, repository metadata, authentication headers etc.) and route the request to a handler function. Grpc_adapter supports regular grpc-go middleware.
  4. Once the server accepts the request, client and server can exchange bytes via a stream socket without any extra protocol layers adding overhead
  5. RPC's are defined using protobuf just like all other Gitaly RPC's
Edited by Jacob Vosmaer