Refactor the duo-chat docs to better organize it

Problem to solve

@tle_gitlab and I reviewed the duo-chat docs doc/how-to/run_duo_chat_eval.md and came up with a few ideas on how to improve it.

Proposal

When running promptlib duo-chat eval –help, add a link to the config section of docs https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/how-to/run_duo_chat_eval.md#configuration-options-in-dataconfigduochat_eval_configjsonexample
Move the "Types of evaluations" into the "Configurations options"
Move "Evaluation datasets" into the "Type of evaluations" AND add the dataset mapping to the input adapter.
Move "Metrics" together with "Type of evaluations" to the top
Reformat "Configuration options" with collapsible sections to make it less verbose.

Under "eval_setup" add a link on model configuration to a new markdown dedicated to documenting the supported models and how to configure them:

{
   "name": "claude-2",
   "prompt_template_config": {
     "templates": [
       {
         "name": "empty",
         "template_path": "data/prompts/duo_chat/answering/empty.txt.example"
       }
     ]
   }
 }

Merge "Configuration file" with "Configuration options"
Update the GCP authentication to use a personal GCP account with gcloud auth command instead of a shared service account. Highlight that this is required to write/read to BigQuery and call Vertex AI models.
Remove the Docker setup. Promote running it locally.
@tle_gitlab Expand the inspecting result section to guide users on what to compare given the metrics selected in the evaluation. For example, if the similarity score is selected, please compare with the similarity score column in the control table.
@tle_gitlab Add a section to how-to doc to indicate that tracing can be enabled (linked to the document in docs.gitlab.com) and inspected after the evaluation.

Further details

Links / references

gitlab-org/gitlab#444948 (closed)

Edited Apr 19, 2024 by Bruno Cardoso