Issue-to-MR should run CLI commands

Problem to solve

A flow that is not allowed to run terminal commands is very limited if it cannot run CLI commands. The main problem is that it lacks a good ability to verify its own changes as it won't be able to run tests or run the code it created to see whether it works.

Desired Outcome

Issue-to-MR performance improves by it being able to run commands.

Proposal

  1. Add run_command to the available tools, either during execution or a separate review phase.
  2. Test the change in quality in SWE bench and optimize from there.

Note: It could be worth a try to even remove some native tools we have (e.g. list_files) and see how the graph performs with just "run_command".

Further details

Links / references

Edited by Sebastian Rehm