Extract tools from checkpoint for flows that don't mock requests
What does this merge request do and why?
This MR improves the way tools are collected after running a given flow. The existing approach was to read the tools from the log file, which is a good approach for SWEbench where we mock all requests, but incorrect for flows like Duo Chat. This MR updates the logic so that if the flow doesn't mock requests (like most flows except SWEbench), we extract the tools from the checkpoint instead.
How to set up and validate locally
poetry run cef agent-platform evaluate .gitlab/agent_platform_templates/duo_chat.yaml --existing-experiment=5f0b6ec8-9b69-4e79-8bbc-bb3349d4e65c
Merge request checklist
-
I've ran the affected pipeline(s) to validate that nothing is broken. -
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Edited by Alexander Chueshev
