Improve agent performance
Current situation
Agent mode is much slower than SSH mode.
Desired outcome
In most cases, not taking into account the startup time, there should not be much differences.
Analysis
If it has no tasks to perform, the agent sleeps and then checks if it has some tasks to perform. This polling time can be adjusted, but specifying a sub-second value for this polling intervalle can induce a high number of requests. If there are many agents, or if they are mostly idle, this stresses the eventbus for no real benefits.
But assigning a task to an agent depends on its last returned information (except for the very first step of a job), and computing this next task is not immediate (this may involve other plugins, such as providers).
Solution
Let us try with the following:
Using semaphores, the agent channel will wait a few seconds (up to 30?) for a new ExecutionCommand. If one is received, it will complete the POST event initiated by the agent. (And, if after the waiting delay nothing is available, it will complete it too.)
- Initial value for the semaphore (associated with the agent registration): 0
- acquired upon receiving an execution result from the agent (with a timeout of 30s)
- released upon receiving an ExecutionCommand (except for the first step)
- released upon receiving the final ExecutionCommand for a job (so that the agent is not held for the next job)