Solve if we should send over config_to_apply if the workspace is in Failed, Error, Unknown, Starting, Stopping, Terminating actual state
Current situation
- Currently, the actual state calculation in Rails there is no concept of Error or Terminating state. There is already an issue an issue to investigate about Terminating state. What is really the difference between Failed and Error state now that we do not have DWO?
- When you create a workspace (which is going to fail), it takes around 10 minutes for the status to be reported back as
Failed
. The reason for this is that the default.spec.progressDeadlineSeconds
is 600 seconds. Now between the time the config was first sent to agentk and 10 minutes later when it is reported back asFailed
, agentk makes multiple calls to Rails and in each call it receives the same config to apply on. Re-applying the config doesn't change anything. So what is the point of resending it every time? - Similarly for
Starting
andStopping
, Rails has already determined the actual state of the workspace to beStarting
andStopping
based on the information reported from agentk. However, Rails sends the config to agentk in every poll to process this information. Re-applying the config doesn't change anything. So what is the point of resending it every time? -
Unknown
ideally should never occur but it does, then it means that we do not know the current state. In this case, how does resending the information every time to agentk help?
Proposal/Ideas
- Distinguish difference between
Failed
andError
state and remove one if required. - If a workspace is in
Failed
/Error
/Unknown
/Terminating
state, do not send config from Rails to agentk unless explicitly requested by user toRetry
? This way these errors would come to the surface more visibly and we'll get to know how frequently are they occurring. Maybe we do not allow theRetry
- I'm just throwing out ideas. - If a workspace is in
Starting
/Stopping
state, it means we have already determined that agentk is taking action on what we requested and should not bombard it again with the same request.
Edited by Vishal Tak