Cancellations incorrectly clear the lease instead of updating its status
Context
When an operation is cancelled, the Bots service should set the corresponding lease's status to CANCELLED
to tell the bot to cancel the job. However, BuildGrid currently "clears" the lease (removing all property values to leave an empty message) instead. This doesn't break BuildGrid's bot since it explicitly checks for this condition, but reccworker
echos the invalid lease back to BuildGrid, further confusing it and preventing it from issuing subsequent jobs to that worker.
Steps to reproduce
- Optionally, add logging statements to
reccworker
to log the sent/received bot session. - Start a BuildGrid server and run
reccworker
- Start a long-running job with
bgd execute command [some directory] -- sleep 100
- Cancel the job with
bgd operation cancel [operation id]
- Attempt to run a new job with
bgd execution command [some directory] -- echo hello
Expected result
- The lease's state is set to
CANCELLED
, the worker acknowledges this, then the job is removed. - The new job is executed.
Actual result
- The lease is "cleared" (replaced with a completely empty
Lease
message) and never removed. - The new job stays queued forever.
Task Description
This bit of BotsInterface.update_bot_session
is suspect -- it looks like the lease could be getting cleared if BotsInterface._check_lease_state
is incorrectly returning None
. (Also, shouldn't it be deleting the lease instead of clearing it?)
Acceptance Criteria
Running the above "Steps to Reproduce" correctly cancels the job and allows a new one to be run. (Also, we should probably have a test for this.)