Evaluate using AI to evaluate search responses
Background
We want to make changes to Elasticsearch queries to improve relevance of results, but have no automated way to evaluate before/after.
Proposals
Evaluate whether an AI platform could be used to evaluate search query/response accuracy. I'll list a few options below (feel free to add):
- LLM judge to grade search response accuracy
- LLM judge is used in
qa_evaluationspec
- LLM judge is used in
- AI Framework group - Eval like I'm 5
- Take inspiration from Duo Chat prompt change evaluations
Edited by Terri Chu