Designing a more efficient backend transport for Git fetch
streamrpc: a protocol for Gitaly RPC's that do bulk data transfers
(Happy to hear suggestions for a better name than "streamrpc".)
Features
- a more efficient transport for
git fetch
-related RPC's - co-exists with gRPC, not a replacement
- RPC's we want to replace: PostUploadPack, PackObjectsHook, SSHUploadPack
- can be rolled out / enabled across all GitLab installations without administrator action (re-uses existing network ports, authentication tokens, TLS certificates)
- works with all Gitaly connection types (unix, tcp, tls)
- works with Praefect
- re-uses existing Gitaly authentication, logging and instrumentation middleware
Caveats
- not compatible with gRPC load balancers, so this assumes Praefect uses a TCP load balancer (like we do on gitlab.com)
High level design
- Every streamrpc call establishes a new network connection: no connection reuse
- The server (Gitaly/Praefect) uses listener multiplexing to accept streamrpc connections
- Use rpctest/grpc_adapter to exchange request metadata at the start of the request (method name, repository metadata, authentication headers etc.) and route the request to a handler function. Grpc_adapter supports regular grpc-go middleware.
- Once the server accepts the request, client and server can exchange bytes via a stream socket without any extra protocol layers adding overhead
- RPC's are defined using protobuf just like all other Gitaly RPC's
Edited by Jacob Vosmaer