In #29 (closed) we added a tool for testing Duo Workflow on predefined tests, it should be possible to extend this tool to support also tests from SWE bench.
@achueshev - created this issue based on your suggestion from the last sync up call. Sounds like a good idea. Do you plan to investigate this option further or should I take this one (both is fine by me)? I think it should be feasible to either import projects from github or re-create only needed resources - issues/comments, similar to draft in gitlab-org/duo-workflow/testing/duo-workflow-tests!1 (closed).
Wdyt about just focusing on SWE bench lite for now? From what I understand the full SWE bench will likely be time consuming and costly to run. It also can contain issues that contain references to other issues / commit-sha's which might be difficult to replicate.
The criteria that have been used to select the SWE bench lite subset of problems should make them ideally suited to just be imported into GitLab.
Run Duo Workflow with the prompt containing the problem statement
Note: Before running Duo Workflow, we need to clone related projects and checkout the base_commit.
@achueshev I've updated test run to support also SWE and return results in expected format. It's possible to run them just by setting SWE dataset, e.g.:
It fails to build docker image, content of image build log:
2024-10-03 12:50:21,223 - INFO - ERROR: No matching distribution found for types-pkg_resources2024-10-03 12:50:22,699 - INFO - ---> Removed intermediate container cfa46bb9b55e2024-10-03 12:50:22,699 - ERROR - Error: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 12024-10-03 12:50:22,699 - ERROR - docker.errors.BuildError during sweb.env.x86_64.29ea89393f6403e072c9f5:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1