Skip to content

Fix BotsService not being stopped gracefully

Zehao Chen requested to merge zchen723/fix-bots-not-shut-down into master

Description

This MR attempts to fix the issue that BotsService cannot be stopped gracefully via SIGTERM.

How to reproduce it?

Compose the following components

  1. BuildGrid with bots
  2. One buildbox-worker connected to it
  3. Setting keep-alive as a large value, e.g. 5 mins
  4. Don't submit any execution
  5. Send SIGTERM to BuildGrid, i.e. kill

Root cause

A worker periodically polls BotsService via UpdateBotsSession GRPC call. However, in our implementation, the call waits if there isn't a lease immediately available and the GRPC thread sleeps while waiting for the job.

Fix

This fix wakes up the GRPC threads of UpdateBotsSession and effectively tells the worker to cancel the session. This makes sense since the service is being shut down.

Edited by Zehao Chen

Merge request reports