Skip to content

Don't long poll for UpdateBotSession calls with low timeouts

Jeremiah Bonney requested to merge jbonney/buildgrid:jbonney/no-long-poll into master

Description

BuildGrid uses the timeout mechanism in GRPC to determine how long it can hold an UpdateBotSession open while waiting for work. To make sure it has enough time to respond after it gets a lease, it subtracts NETWORK_TIMEOUT seconds from the time remaining, and then waits that long for a lease to give the worker before responding that doesn't have anything.

However, when the given time remaining for a request is less than NETWORK_TIMEOUT, we don't subtract that and just wait for the full duration. This often results in TIMEOUT EXCEEDED errors, since the request times out before BuildGrid can finish responding. In these cases, we shouldn't be waiting for work at all and respond as quickly as possible, which is the behavior when deadline is not specified in request_job_leases.

Edited by Jeremiah Bonney

Merge request reports