Remove existing worker if already registered in register_worker
Before raising this MR, consider whether the following are required, and complete if so:
-
Unit tests -
Metrics -
Documentation update(s)
If not required, please explain in brief why not.
Description
For the JobAssignmentThread
we keep a mapping of Worker
, an object we use to coordinate assignment of work between the assignment thread and waiting grpc requests, to bot_session.name
, a unique name given to each bot session on creation. However by using bot_session.name
, things get strange if two requests are being processed for the same bot_session.name
. This shouldn't happen, but it can if a bot cancels an in-flight UpdateBotSession
request and then sends a follow up UpdateBotSession
. In this case, the "cancelled" UpdateBotSession
is still processing in BuildGrid when the new UpdateBotSession
comes in. This causes the JobAssignmentThread to get very confused, leading to KeyError
being thrown due to a worker being removed from the worker_map
.
This PR addresses this by explicitly removing the worker in register_worker
if a worker already exists for this name. This makes it so various combinations of register_worker
-> remove_worker
when used from multiple threads with the same bot name doesn't cause the JobAssignmentThread
to get in a bad state. It's not perfect though, as it has some unfortunate behavior where a worker may be prematurely removed. Consider the following two request r1/r2
r1 -> register_worker(foo)
r1 -> wait_for_work() # blocking until event set or times out
r2-> register_worker(foo) # Replaces r1
r1 -> wait_for_work() times out
r1 -> remove_worker(foo) # foo is removed so no work will be assigned to r2
Having r2
not get any work and wait around until the ttl expires isn't great, but it is much better than the current behavior of the JobAssignmentThread
breaking. I couldn't come up with a cleaner way to solve this problem, and definitely happy to hear other thoughts.
Validation
Added a test to the job assigner to verify that calling register_worker
with the same name twice doesn't cause things to break horribly.