Skip to content

Use precomputed answers in duo-chat eval

Hongtao Yang requested to merge hyang/use-precomputed-answers into main

What does this merge request do and why?

This MR adds the ability to use the precomputed answers (generated by the inference pipeline) in the evaluation pipeline, saving us a ton of LLM calls and significantly reduce the eval pipeline time.

Example config for using a precomputed table:

Click to expand
{
  "beam_config": {
    "pipeline_options": {
      "runner": "DataflowRunner",
      "project": "dev-ai-research-0e2f8974",
      "region": "us-central1",
      "temp_location": "gs://prompt-library/tmp/",
      "save_main_session": true,
      "sdk_container_image": "us-central1-docker.pkg.dev/dev-ai-research-0e2f8974/prompt-library/hyang-runner:dev-0.6.1",
      "sdk_location": "container",
      "subnetwork": "regions/us-central1/subnetworks/default"
    }
  },
  "precomputed_answers": ["dev-ai-research-0e2f8974.duo_chat_experiments.hyang-gpt_4-issue_epic-answers"],
  "output_sinks": [
    {
      "type": "bigquery",
      "path": "dev-ai-research-0e2f8974.duo_chat_experiments",
      "prefix": "hyang-gpt_4-issue_epic-eval"
    }
  ],
  "throttle_sec": 0.3,
  "batch_size": 10,
  "eval_setup": {
    "metrics": [
      {
        "metric": "independent_llm_judge",
        "evaluating_models": [
          {
            "name": "gpt-4o",
            "prompt_template_config": {
              "templates": [
                {
                  "name": "gpt-4-independent",
                  "template_path": "data/prompts/duo_chat/evaluating/gpt-4-independent-user.example.txt",
                  "system_template_path": "data/prompts/duo_chat/evaluating/gpt-4-independent-system.example.txt"
                }
              ]
            }
          },
          {
            "name": "claude-2",
            "prompt_template_config": {
              "templates": [
                {
                  "name": "claude-2",
                  "template_path": "data/prompts/duo_chat/evaluating/claude-2.txt.example"
                }
              ]
            }
          },
          {
            "name": "text-bison-32k@latest",
            "prompt_template_config": {
              "templates": [
                {
                  "name": "text-bison",
                  "template_path": "data/prompts/duo_chat/evaluating/claude-2.txt.example"
                }
              ]
            }
          }
        ]
      }
    ]
  }
}

This MR is only for duo-chat, but if we like this functionality, we can add it to code-suggestions and ETV later.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Relates to: #203

Merge request checklist

  • I've ran the affected pipeline(s) to validate that nothing is broken.
  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Hongtao Yang

Merge request reports