Evaluate using AI to evaluate search responses

Background

We want to make changes to Elasticsearch queries to improve relevance of results, but have no automated way to evaluate before/after.

Proposals

Evaluate whether an AI platform could be used to evaluate search query/response accuracy. I'll list a few options below (feel free to add):

  • LLM judge to grade search response accuracy
    • LLM judge is used in qa_evaluation spec
  • AI Framework group - Eval like I'm 5
  • Take inspiration from Duo Chat prompt change evaluations
Edited Jun 05, 2024 by Terri Chu
Assignee Loading
Time tracking Loading