Add support for Mixtral 8x 7B v0.1 model (!382) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / CEF

What does this merge request do and why?

This adds support to run evaluation with Mixtral 8x 7b v0.01 model. This model is hosted on Vertex AI endpoints.

How to set up and validate locally

Ensure GCP environment variables are setup.
Check out to this merge request's branch.

Copy the following config to a file, e.g. data/config/duochat_mixtral_test.json. Feel free to replace local duo chat with the production URL.

{
  "beam_config": {
    "pipeline_options": {
      "runner": "DirectRunner",
      "project": "dev-ai-research-0e2f8974",
      "region": "us-central1",
      "temp_location": "gs://prompt-library/tmp/",
      "save_main_session": false
    }
  },
  "input_bq_table": "dev-ai-research-0e2f8974.duo_chat.sampled_code_generation_v1",
  "input_adapter": "mbpp",
  "output_sinks": [
    {
      "type": "local",
      "path": "data/output",
      "prefix": "experiment"
    }
  ],
  "throttle_sec": 1,
  "batch_size": 10,
  "eval_setup": {
    "answering_models": [
      {
        "name": "mixtral-8x-7b-instruct-01",
        "prompt_template_config": {
          "templates": [
            {
              "name": "mixtral-instruct",
              "template_path": "data/prompts/duo_chat/answering/mixtral-instruct.txt.example"
            }
          ]
        }
      },
      {
        "name": "duo-chat",
        "parameters": {
          "base_url": "http://gdk.test:8080"
        },
        "prompt_template_config": {
          "templates": [
            {
              "name": "empty",
              "template_path": "data/prompts/duo_chat/answering/empty.txt.example"
            }
          ]
        }
      }
    ],
    "metrics": [
      {
        "metric": "similarity_score"
      },
      {
        "metric": "independent_llm_judge",
        "evaluating_models": [
          {
            "name": "text-bison-32k@latest",
            "prompt_template_config": {
              "templates": [
                {
                  "name": "claude-2",
                  "template_path": "data/prompts/duo_chat/evaluating/claude-2.txt.example"
                }
              ]
            }
          }
        ]
      }
    ]
  }
}

Kick off a Duo Chat pipeline.

❯ poetry run promptlib duo-chat eval --config-file=data/config/duochat_mixtral_test.json --test-run --sample-size 1
Requesting answers from mixtral-8x-7b-instruct-01: 1it [00:36, 36.49s/it]
Requesting answers from duo-chat: 1it [00:36, 36.49s/it]00:14, 14.04s/it]
Calculating similarity scores: 2it [00:03,  1.67s/it]
Getting evaluation from text-bison-32k@latest: 4it [00:19,  4.96s/it]                                                                                                                           
INFO:promptlib.common.beam.io:Output written to CSV: data/output/experiment_20240417_004059__independent_llm_judge-00000-of-00001.csv                                                           
INFO:promptlib.common.beam.io:Output written to CSV: data/output/experiment_20240417_004059__similarity_score-00000-of-00001.csv

Inspect the result.
- experiment_20240417_004059__similarity_score-00000-of-00001.csv
- experiment_20240417_004059__independent_llm_judge-00000-of-00001.csv

Merge request checklist

I've ran the affected pipeline(s) to validate that nothing is broken.
Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Apr 16, 2024 by Tan Le

Add support for Mixtral 8x 7B v0.1 model

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports