Projects with this topic
Sort by:
-
Agent-shape testing harness that measures how an LLM-driven agent uses a tool's CLI, scored by an LLM judge.
Updated
Agent-shape testing harness that measures how an LLM-driven agent uses a tool's CLI, scored by an LLM judge.