Draft: PoC: langsmith/pytest integration tests of executor agent tool use
What does this MR do and why?
Proof of concept of langsmith/pytest integration tests to test the behavior of Duo Workflow Service components with real LLM calls without having to run a full workflow end-to-end.
This example focuses on a few tests of Executor agent tool use. They test that the executor agent follows its prompt instructions to:
- get the plan first using the get_plan tool
- set the task status to completed after completing a task
- use the handover_tool after completing all tasks
Note: The tests don't run in CI atm because they need a LangSmith token. But you can run them locally as typical pytest tests, plus a LangSmith environment variable:
export LANGSMITH_TEST_SUITE="LLM Integration tests" # the name of the LangSmith dataset that the tests will be grouped into
poetry run python -m pytest "tests/duo_workflow_service/tools/test_executor_agent.py"
Here's an example set of results in LangSmith
Related issues
Edited by Mark Lapierre