Add missing answer in evaluating template (!523) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / Prompt Library

Tan Le requested to merge fix-rca-templates into main Jun 17, 2024

What does this merge request do and why?

The GPT-4 evaluating template does not include the model answers to be evaluated. This causes invalid evaluation (low score) since there are no answer to judge.

Below are the number of tests that has correctness = 1. We are only address gpt-4o as part of this MR.

count	answering_model	evaluating_model
7	gpt-4o	gpt-4o
13	claude-2.1	gpt-4o
18	gpt-4-turbo	gpt-4o
12	claude-3-opus	gpt-4o
15	claude-3-haiku	gpt-4o
14	claude-3-sonnet	gpt-4o
14	code-bison@latest	gpt-4o
9	text-bison-32k@latest	gpt-4o
1	claude-3-haiku	text-bison-32k@latest
89	code-bison@latest	text-bison-32k@latest
14	text-bison-32k@latest	text-bison-32k@latest
1	claude-3-opus	text-bison-32k@latest

An example explanation of an invalid evaluation output.

The AI assistant's root cause analysis is missing, so I cannot evaluate its correctness, readability, or comprehensiveness. To provide a proper evaluation, the AI assistant should have identified the root cause of the CI job failure and explained it clearly.

Relates to #327

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

I've ran the affected pipeline(s) to validate that nothing is broken.
Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Jun 18, 2024 by Tan Le

Add missing answer in evaluating template

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports