Skip to content

Update buildbox-worker to respond promptly to signals for graceful shutdown

Jeremiah Bonney requested to merge jbonney/async-bot-session into master

Description

This PR updates buildbox-worker's shutdown handling to work as intended, specifically for it to respond promptly to a shutdown signal and start a graceful shutdown. This required several changes, which I've tried to break down into logical commits. The highlights are:

  • Use the async gRPC methods to send CreateBotSession and UpdateBotSession requests, and break out of a request if a shutdown was asked for. This is important for cases where a RWAPI server may hold the connection open for long-polling reasons, as before this change buildbox-worker would hang.
    • Also migrate the tests to use buildboxcommon::TestGrpcServer, as the mocks gRPC produces only work for the sync apis.
  • Don't use the wait time returned by calculateWaitTime directly in the wait_for condition variable calls, and instead use a shorter duration and explicitly check if we need to send an UpdateBotSession request. This lets buildbox-worker detect any signal changes right away and respond to them, instead of being blocked.

NOTE: This PR now depends on buildbox-common!436 to support specifying additional expected/OK statuses to GrpcRetrier.

Validation

Bring up a buildbox-worker process with a high --request-timeout value and send a SIGINT/SIGTERM to it. Before this change buildbox-worker would hang until the request-timeout had passed, but now it will promptly shut down.

Similarly, assign a long-running job to buildbox-worker and send a SIGINT/SIGTERM to it. Before this change buildbox-worker would wait until the assigned expire_time has elapsed, which may be on the order of minutes. After this change, buildbox-worker will promptly start gracefully shutting down.

Edited by Jeremiah Bonney

Merge request reports