Skip to content

Restart Unicorn and Sidekiq when GRPC throws 14:Endpoint read failed

Jacob Vosmaer requested to merge grpc-unavailable-restart into master

What does this MR do?

The current version of the grpc gem (1.8.x) has a bug where it can leave a Ruby process unable to make any RPC calls after an RPC server restarts. So when this bug hits, a Unicorn worker or Sidekiq server can no longer make Gitaly RPC calls. The only way to recover we know of is to restart the process.

This change causes the current process to restart the moment a Gitaly call encounters the error associated with this bug. In the case of Unicorn we use a unix signal to request a graceful shutdown after the current request has finished. In the case of Sidekiq we hook into the 'memory shutdown' middleware that performs a graceful shutdown in the case of memory leaks.

In addition to this, this change fixes a bug in the Sidekiq shutdown code that was trying to send a non-existing Unix signal to the process. This bug prevented the Sidekiq shutdown from ever taking place.

Are there points in the code the reviewer needs to double check?

Why was this MR needed?

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes gitaly#1036 (closed)

Related to gitaly#1029 (closed) and https://gitlab.com/gitlab-org/gitlab-ce/issues/43254.

Edited by Jacob Vosmaer

Merge request reports