Enable server gRPC keepalives
We enabled client-side gRPC keepalives in gitlab!73302 (merged) to fix load balancer timeouts (https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit/-/issues/290) as described in gitlab!78966 (comment 818986276).
However, this created a different problem where long gRPC calls can be shut down prematurely with GOAWAY messages because the gRPC server only allows 2 keepalives in 2-hour window. We see these as 14: Socket closed
messages (https://sentry.gitlab.net/gitlab/gitlabcom/issues/3090872/events/). In addition, ReplicateRepository
fails with GOAWAY
messages as well (https://log.gprd.gitlab.net/goto/909cb9a0-7d62-11ec-9dd2-93d354bef8e7).
We may want to investigate server gRPC keepalives instead:
diff --git a/internal/gitaly/server/server.go b/internal/gitaly/server/server.go
index 06a665dcf..b808aa76c 100644
--- a/internal/gitaly/server/server.go
+++ b/internal/gitaly/server/server.go
@@ -147,6 +147,9 @@ func New(
MinTime: 20 * time.Second,
PermitWithoutStream: true,
}),
+ grpc.KeepaliveParams(keepalive.ServerParameters{
+ Time: 5 * time.Minute,
+ }),
}
return grpc.NewServer(opts...), nil
Go 1.13 also enables 15-second TCP keepalives by default. I'd also like to know why this doesn't solve the problem.
/cc: @wchandler