Issue-to-MR should run CLI commands
Problem to solve
A flow that is not allowed to run terminal commands is very limited if it cannot run CLI commands. The main problem is that it lacks a good ability to verify its own changes as it won't be able to run tests or run the code it created to see whether it works.
Desired Outcome
Issue-to-MR performance improves by it being able to run commands.
Proposal
- Add
run_commandto the available tools, either during execution or a separate review phase. - Test the change in quality in SWE bench and optimize from there.
Note: It could be worth a try to even remove some native tools we have (e.g. list_files) and see how the graph performs with just "run_command".
Further details
Links / references
Edited by Sebastian Rehm