Skip to content

Solve if we should send over config_to_apply if the workspace is in Failed, Error, Unknown, Starting, Stopping, Terminating actual state

Current situation

  • Currently, the actual state calculation in Rails there is no concept of Error or Terminating state. There is already an issue an issue to investigate about Terminating state. What is really the difference between Failed and Error state now that we do not have DWO?
  • When you create a workspace (which is going to fail), it takes around 10 minutes for the status to be reported back as Failed. The reason for this is that the default .spec.progressDeadlineSeconds is 600 seconds. Now between the time the config was first sent to agentk and 10 minutes later when it is reported back as Failed, agentk makes multiple calls to Rails and in each call it receives the same config to apply on. Re-applying the config doesn't change anything. So what is the point of resending it every time?
  • Similarly for Starting and Stopping, Rails has already determined the actual state of the workspace to be Starting and Stopping based on the information reported from agentk. However, Rails sends the config to agentk in every poll to process this information. Re-applying the config doesn't change anything. So what is the point of resending it every time?
  • Unknown ideally should never occur but it does, then it means that we do not know the current state. In this case, how does resending the information every time to agentk help?

Proposal/Ideas

  • Distinguish difference between Failed and Error state and remove one if required.
  • If a workspace is in Failed/Error/Unknown/Terminating state, do not send config from Rails to agentk unless explicitly requested by user to Retry ? This way these errors would come to the surface more visibly and we'll get to know how frequently are they occurring. Maybe we do not allow the Retry - I'm just throwing out ideas.
  • If a workspace is in Starting/Stopping state, it means we have already determined that agentk is taking action on what we requested and should not bombard it again with the same request.
Edited by Vishal Tak