Add evaluation for documentation questions with context
What does this merge request do and why?
I have hound this dataset - we need to check if it is used/fresh: https://console.cloud.google.com/bigquery?ws=!1m5!1m4!4m3!1sdev-ai-research-0e2f8974!2sduo_chat!3sdocumentation_v3&project=ai-enablement-dev-69497ba7
and I imported it to langsmith, also created custom prompt for evaluation (I used claude generate prompt function, it’s brilliant).
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.