Draft: PoC: langsmith/pytest integration tests of executor agent tool use

What does this MR do and why?

Proof of concept of langsmith/pytest integration tests to test the behavior of Duo Workflow Service components with real LLM calls without having to run a full workflow end-to-end.

This example focuses on a few tests of Executor agent tool use. They test that the executor agent follows its prompt instructions to:

  • get the plan first using the get_plan tool
  • set the task status to completed after completing a task
  • use the handover_tool after completing all tasks

Note: The tests don't run in CI atm because they need a LangSmith token. But you can run them locally as typical pytest tests, plus a LangSmith environment variable:

export LANGSMITH_TEST_SUITE="LLM Integration tests" # the name of the LangSmith dataset that the tests will be grouped into 
poetry run python -m pytest "tests/duo_workflow_service/tools/test_executor_agent.py"

Here's an example set of results in LangSmith

Related issues

Edited by Mark Lapierre

Merge request reports

Loading