Replace Terminated field with TerminationProgress in WorkspaceAgentInfo in Agent
- Rails MR: Fix issue with workspaces in Terminating state (!118783 - merged)
- GA4K MR: Replace Terminated field with TerminationProgre... (#406565 - closed)
- Related frontend Rails issues:
Why
With Properly handle full and partial sync (#397702 - closed) merged, rails no longer bombards agent with requests that have already been sent to agentk earlier. However this has affected the way agent reacts to terminated workspaces. (context)
Details of the problem
- As mentioned above, the changes introduced will prevent rails from constantly sending the same requests to agentk. This is especially relevant for when rails wishes to Terminate a Running workspace.
- Before the changes, the complete termination of a workspace worked in the following manner over the course of multiple partial sync cycles:
-
partial sync 1: agentk receives a termination request for a running workspace. Within agentk, the workspace is now being tracked in the
persistentStateTracker
-
partial sync 2: (assuming that the workspace is terminated on agentk side) agentk continues to receive the termination request for the same workspace. Since the workspace is no longer visible in the cluster, agentk tracks it in the
terminatedTracker
-
partial sync 3: agentk will notify rails of all the terminated workspaces in
terminatedTracker
including the one terminated in the previous step. Upon successful acknowledgement from rails, these tracked workspaces are removed fromterminatedTracker
-
partial sync 1: agentk receives a termination request for a running workspace. Within agentk, the workspace is now being tracked in the
- There are 2 key takeaways from the above steps
-
terminatedTracker
tracks workspaces after they have been terminated in the cluster and is responsible for ensuring the eventual syncing/cleanup of agentk state for the workspace with rails in the next partial sync. - In the older implementation, receiving repeated termination requests from rails is key to cleanly terminating the workspace and leaving the agentk in a consistent state. Without the followup termination request from rails in partial sync 2, agentk has no way to add it to the
terminatedTracker
and therefore ensuring that the terminated state is eventually synced with rails
-
- As such, eliminating repeated termination requests will leave the agentk in a state where its unable to report on successful termination of workspaces.
Solution
The following require modification in order to address this issue:
Update terminatedTracker in Agentk
As mentioned in the problem, the responsibility of the terminatedTracker
so far has been to track workspaces after they have been terminated in the cluster and ensure these are synced with rails eventually.
Given that there will not be any repeated requests to terminate,
-
terminatedTracker
must track workspace from the first instance a termination request is received by agentk -
terminatedTracker
must distinguish between workspaces that are eitherTerminating
orTerminated
. These two different states are required as they must be handled differently:-
Terminating
workspaces: For each termination request received by rails, an entry must be added for the affected workspace interminatedTracker
with this state. At the start of each partial sync, everyTerminating
workspace must be cross-checked with the cluster to verify if it has been terminated. If yes,terminatedTracker
must updated the entry of the workspace toTerminated
-
Terminated
workspaces: During partial sync, eachTerminated
workspace in the tracker must be synced with rails as before. And as before, the entries for the workspace must be evicted upon successful acknowledgement from rails.
-
Update Rails request structure in Agentk
Currently, we are only sending Terminated
(a boolean) in WorkspaceAgentInfo
in agentk. This should be a enumeration
- Terminating
- Terminated
This will make the Terminating
state an actual state reported by the agent. This is helpful as well to know that a termination request is already in progress.
ex request schema
{
"message_type": "workspace_updates",
"update_type": "partial",
"workspace_agent_infos": [{
"name": "workspace-A",
"termination_progress": "terminating" // or "terminated"
}]
}