Skip to content

grpc-proxy: Fix flake caused by race between serving and stopping

Patrick Steinhardt requested to merge pks-grpc-proxy-fix-flaky-serve into master

The error propagation tests for our gRPC proxy set up two gRPC servers, the proxy server and the backend server. As we verify that errors are correctly propagated at different leveles when proxying, some of the subtests will never cause the backend server to be hit in case the proxy server already retruns an error.

This can cause a race in our test setup between starting to serve and tearing down the backend server when the backend server is never being connected to. As nothing will synchronize the Goroutine that starts to serve with the deferred function to stop it, they may be executed out of order so that we first stop and then try to serve the server. This will cause grpc.Server.Serve() to return an error:

=== FAIL: internal/praefect/grpc-proxy/proxy TestProxyErrorPropagation/director_error_is_propagated (0.01s)
    handler_ext_test.go:358:
                Error Trace:    /home/pks/Development/gitlab/gdk/gitaly/internal/praefect/grpc-proxy/proxy/testhelper.go:123
                                                        /home/pks/Development/gitlab/gdk/gitaly/internal/praefect/grpc-proxy/proxy/handler_ext_test.go:358
                                                        /home/pks/Development/gitlab/gdk/gitaly/internal/praefect/grpc-proxy/proxy/asm_amd64.s:1594
                Error:          Received unexpected error:
                                grpc: the server has been stopped
                Test:           TestProxyErrorPropagation/director_error_is_propagated
    --- FAIL: TestProxyErrorPropagation/director_error_is_propagated (0.01s)

Fix this by explicitly handling and ignoring grpc.ErrServerStopped in this test. While we could move this error handling into MustServe(), it does not feel right to do so as it indeed indicates a real error when trying to serve. Furthermore, this error is really quite specific to this test setup as you can typically expect to hit the gRPC server that you're setting up.

Closes #4595 (closed).

Edited by Patrick Steinhardt

Merge request reports