Analytics agent automated testing plan
For DAP GA we want the Foundational Agents to have been minimally evaluated to solve the problems they were designed for (&19500 (comment 2889886842).
We're currently using a test automating tool we developed to validate the responses over multiple requests: https://gitlab.com/gitlab-org/analytics-section/platform-insights/duo-analytics-agent-prompt/-/tree/main/prompt-test-automator. The tool is not automated and needs to be run locally against a GDK ( with Duo enabled and configured ).
There is a bigger initiative for agents automated testing, but I don't think it's happening before GA https://gitlab.com/gitlab-org/gitlab/-/issues/580874+
There is also a separate initiative for prompt validaiton that we might want to look at and maybe integrate with: https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library
As part of this task we should:
- Add support for tools execution and validation to our prompt-test-automator
- Add more tests cases to our prompt-test-automator
- Look into CES and determine if it's something we want to integrate with