In a nutshell we want to be able to complete user auto-code requirements such as "Write code for this issue #123 and create an MR". In order to do that we're going to write an agent that will clone the repository, write code and tests and debug and fix issues. Since this code is going to be untrusted, we need to write and run code in secure code sandbox, where the LLM can install dependencies, run tests, etc. We were wondering whether we can leverage the runner architecture in a way to spin up these ephemeral code sandboxes. How much of a lift would it be so that we can somehow spin up an environment and have bi-directional communication with it? We want to try to reuse what we have for runners vs trying to build our own Kubernetes/Firecracker/Docker container based solution.
Proposal
Time box discussion and investigation over the next couple of days on whether or not GitLab runner can support this.
If this is a viable path, create implementation issues for Runner Teams and we can discuss LoE and Milestone to support
FYI @ayufan@josephburnett@DarrenEastman - this is an effort we are kicking off for Duo Workflow (a new feature set for fully autonomous agents in GitLab) We would like your help in understanding if Runner technology (Maybe even Step Runner) is a good fit for this usecase. cc @cheryl.li@nicolewilliams
@jreporter Step Runner (with Workspaces) is a good fit for this use case.
The tasks of cloning the repo, writing and running tests, etc... can be described as steps. GitLab will be providing a canonical step for cloning the repo which Runner injects into the user's job (in place of generating a shell script). And much of the user's testing workflow will eventually be described in steps.
@marshall007 has shown that AI Agents are capable of generating steps to solve tasks such as finding the weather. Which makes sense because the input and outputs are well defined. We just need a lot more examples! And the CI catalog is going to be an ecosystem of steps which can be reused.
Step runner is not opinionated about where it runs. We will be installing it on GitLab Hosted Runners and injecting user jobs as steps payloads over gRPC. So the exact same machines that run GitLab Runner jobs can easily run steps for AI.
However it might be much nicer to use actual workspaces for development as suggested by @vtakbelow. In that case, step-runner could run in the workspace and execute adhoc steps generated by the AI. Or the AI can exec the binary at will.
Using step-runner as a base of operations in the workspace could solve the unstable SSH problem as well.
Challenge - SSH in workspaces is not stable
Step-runner allows reconnecting to an existing running request without disrupting the actual work. We designed it this way because connections between the Runner Manager the ephemeral VM / pod can be interrupted (especially in Kubernetes). GitLab Runner will still connect to the VM over SSH, but it will just proxy to a gRPC running on a local socket.
Once workspace is ready, AI agent will SSH into the workspace, and perform certain actions
Create a new git branch.
Use all the data given in the issue, quick action prompt and the cloned project's code and send the information to the Code Suggestions endpoint directly. The PAT injected inside the workspace can be used for authentication/authorization.
Based on the response from the LLM, create the necessary files and run tests.
Commit and push. Create MR using GitLab APIs.
All these actions could be steps the AI could just call with inputs (endpoint, token, branch name, etc...).
Would we be able to consider leaning on Remote Development workspaces to provide a secure platform for these agents to operate within?
The architecture of workspaces inherently supports sandboxing by running in containerized environments. This setup is suitable for testing and executing untrusted code.
Thanks for the ping here, @oregand . This is a fantastic overlap!
The workflow I'm imagining
Developer is assigned an issue.
Developer reads up and run a quick action command on the issue - providing additional information if necessary.
That quick action will spin up a workspace. This workspace will have the gitlab workflow extension auto-configured.
There needs to be some way which will trigger the code generation/suggestion from within the workspace - this is the missing part.
Once code have been generated, open up an MR and comment on the issue with the MR link(which has the workspace link as well).
As a developer, you go to the workspace, verify the code, test it out, make additional changes if needed.
Once the MR is closed, the workspace will get terminated.
It also aligns with the team's eventual goal of "create a workspace for each MR". The workflow above would tightly integrate the entire workflow(AI as well)! It will also help with adoption of Workspaces!
Regarding the technical stuff, not sure how/when AI agents will work here. Will have to catch up with @shekharpatnaik on it since he is well versed with both Workspaces and AI Agents.
After talking with @oregand on Slack, the following idea came up
The AI agent is defined on the AI Gateway side. We can have it running as a sidecar, maybe.
AI agent will trigger workspace creation using GraphQL endpoints.
As part of workspace creation, the project is already cloned into the workspace. A PAT is also injected into the workspace and the git configuration is already set to use this PAT for git operations. Note - git CLI needs to be available in the container image from which workspace is being provisioned. That is a safe assumption because otherwise, what are they even doing!?
Once workspace is ready, AI agent will SSH into the workspace, and perform certain actions
Create a new git branch.
Use all the data given in the issue, quick action prompt and the cloned project's code and send the information to the Code Suggestions endpoint directly. The PAT injected inside the workspace can be used for authentication/authorization.
Based on the response from the LLM, create the necessary files and run tests.
Thanks for capturing that @vtak ! I'll defer to @shekharpatnaik for if that approach looks reasonable or what we might be able to build on from it as a idea! 💚
Pini Wietchnerchanged title from Spike: Investigate using runners to support AI Agents "create code" workflows to Spike: Investigate using runners to support Duo workflows "create code" workflows
changed title from Spike: Investigate using runners to support AI Agents "create code" workflows to Spike: Investigate using runners to support Duo workflows "create code" workflows
@DylanGriffith we already had the ability to run within a CI job and just disabled it for now since it's not a priority and there were security issues associated with it, correct?