Skip to content

Respect expire_time when no work given if BotStatus is not OK

Jeremiah Bonney requested to merge jbonney/unhealthy-bot-wait-time into master

Before raising this MR, consider whether the following are required, and complete if so:

  • Unit tests
  • Metrics N/A
  • Documentation update(s) N/A

Description

buildbox-worker will selectively ignore the expire_time given by the Bots service if it has no work to do so that it can respond to new work quickly. This optimization causes issues if the bot itself is in a non-OK state, for example if set to UNHEALTHY in the config file. This causes buildbox-worker to send an UpdateBotSession request, which the remote Bots service will return no leases due to it being unhealthy, and buildbox-worker will send another request after the default wait time (250ms) instead of waiting some amount derived from expire_time. This leads to a lot of unnecessary requests on the Bots service for absolutely no gain...as long as the bot is healthy it's not going to get work regardless of how hard it tries.

This PR updates buildbox-worker to respect expire_time, waiting longer between UpdateBotSession calls to send the next request, as well as a test for the new behavior.

Merge request reports