Draft: PoC: langsmith/pytest integration tests of executor agent tool use (!346) · Merge requests · GitLab.org / duo-workflow / Duo Workflow Service

What does this MR do and why?

Proof of concept of langsmith/pytest integration tests to test the behavior of Duo Workflow Service components with real LLM calls without having to run a full workflow end-to-end.

This example focuses on a few tests of Executor agent tool use. They test that the executor agent follows its prompt instructions to:

get the plan first using the get_plan tool
set the task status to completed after completing a task
use the handover_tool after completing all tasks

Note: The tests don't run in CI atm because they need a LangSmith token. But you can run them locally as typical pytest tests, plus a LangSmith environment variable:

export LANGSMITH_TEST_SUITE="LLM Integration tests" # the name of the LangSmith dataset that the tests will be grouped into 
poetry run python -m pytest "tests/duo_workflow_service/tools/test_executor_agent.py"

Here's an example set of results in LangSmith

Related issues

#244 (moved)

Edited Mar 18, 2025 by Mark Lapierre

Draft: PoC: langsmith/pytest integration tests of executor agent tool use

What does this MR do and why?

Related issues

Merge request reports