Use precomputed answers in duo-chat eval (!479) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / Prompt Library

Hongtao Yang requested to merge hyang/use-precomputed-answers into main May 28, 2024

What does this merge request do and why?

This MR adds the ability to use the precomputed answers (generated by the inference pipeline) in the evaluation pipeline, saving us a ton of LLM calls and significantly reduce the eval pipeline time.

Example config for using a precomputed table:

Click to expand

{
  "beam_config": {
    "pipeline_options": {
      "runner": "DataflowRunner",
      "project": "dev-ai-research-0e2f8974",
      "region": "us-central1",
      "temp_location": "gs://prompt-library/tmp/",
      "save_main_session": true,
      "sdk_container_image": "us-central1-docker.pkg.dev/dev-ai-research-0e2f8974/prompt-library/hyang-runner:dev-0.6.1",
      "sdk_location": "container",
      "subnetwork": "regions/us-central1/subnetworks/default"
    }
  },
  "precomputed_answers": ["dev-ai-research-0e2f8974.duo_chat_experiments.hyang-gpt_4-issue_epic-answers"],
  "output_sinks": [
    {
      "type": "bigquery",
      "path": "dev-ai-research-0e2f8974.duo_chat_experiments",
      "prefix": "hyang-gpt_4-issue_epic-eval"
    }
  ],
  "throttle_sec": 0.3,
  "batch_size": 10,
  "eval_setup": {
    "metrics": [
      {
        "metric": "independent_llm_judge",
        "evaluating_models": [
          {
            "name": "gpt-4o",
            "prompt_template_config": {
              "templates": [
                {
                  "name": "gpt-4-independent",
                  "template_path": "data/prompts/duo_chat/evaluating/gpt-4-independent-user.example.txt",
                  "system_template_path": "data/prompts/duo_chat/evaluating/gpt-4-independent-system.example.txt"
                }
              ]
            }
          },
          {
            "name": "claude-2",
            "prompt_template_config": {
              "templates": [
                {
                  "name": "claude-2",
                  "template_path": "data/prompts/duo_chat/evaluating/claude-2.txt.example"
                }
              ]
            }
          },
          {
            "name": "text-bison-32k@latest",
            "prompt_template_config": {
              "templates": [
                {
                  "name": "text-bison",
                  "template_path": "data/prompts/duo_chat/evaluating/claude-2.txt.example"
                }
              ]
            }
          }
        ]
      }
    ]
  }
}

This MR is only for duo-chat, but if we like this functionality, we can add it to code-suggestions and ETV later.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Relates to: #203

Merge request checklist

I've ran the affected pipeline(s) to validate that nothing is broken.
Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Jun 19, 2024 by Hongtao Yang

Use precomputed answers in duo-chat eval

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports