Restart Unicorn and Sidekiq when GRPC throws 14:Endpoint read failed
What does this MR do?
The current version of the grpc gem (1.8.x) has a bug where it can leave a Ruby process unable to make any RPC calls after an RPC server restarts. So when this bug hits, a Unicorn worker or Sidekiq server can no longer make Gitaly RPC calls. The only way to recover we know of is to restart the process.
This change causes the current process to restart the moment a Gitaly call encounters the error associated with this bug. In the case of Unicorn we use a unix signal to request a graceful shutdown after the current request has finished. In the case of Sidekiq we hook into the 'memory shutdown' middleware that performs a graceful shutdown in the case of memory leaks.
In addition to this, this change fixes a bug in the Sidekiq shutdown code that was trying to send a non-existing Unix signal to the process. This bug prevented the Sidekiq shutdown from ever taking place.
Are there points in the code the reviewer needs to double check?
Why was this MR needed?
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
- Changelog entry added, if necessary
- Documentation created/updated
- API support added
- Tests added for this feature/bug
- Has been reviewed by UX
- Has been reviewed by Frontend
- Has been reviewed by Backend
- Has been reviewed by Database
- Conform by the merge request performance guides
- Conform by the style guides
- Squashed related commits together
- Internationalization required/considered
End-to-end tests pass (
package-qamanual pipeline job)
What are the relevant issue numbers?
Closes gitaly#1036 (closed)