Refactor the duo-chat docs to better organize it
Problem to solve
@tle_gitlab and I reviewed the duo-chat docs doc/how-to/run_duo_chat_eval.md
and came up with a few ideas on how to improve it.
Proposal
-
When running promptlib duo-chat eval –help, add a link to the config section of docs https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/doc/how-to/run_duo_chat_eval.md#configuration-options-in-dataconfigduochat_eval_configjsonexample -
Move the "Types of evaluations" into the "Configurations options" -
Move "Evaluation datasets" into the "Type of evaluations" AND add the dataset mapping to the input adapter. -
Move "Metrics" together with "Type of evaluations" to the top -
Reformat "Configuration options" with collapsible sections to make it less verbose. -
Under "eval_setup" add a link on model configuration to a new markdown dedicated to documenting the supported models and how to configure them: { "name": "claude-2", "prompt_template_config": { "templates": [ { "name": "empty", "template_path": "data/prompts/duo_chat/answering/empty.txt.example" } ] } }
-
Merge "Configuration file" with "Configuration options" -
Update the GCP authentication to use a personal GCP account with gcloud auth command instead of a shared service account. Highlight that this is required to write/read to BigQuery and call Vertex AI models. -
Remove the Docker setup. Promote running it locally. -
@tle_gitlab Expand the inspecting result section to guide users on what to compare given the metrics selected in the evaluation. For example, if the similarity score is selected, please compare with the similarity score column in the control table. -
@tle_gitlab Add a section to how-to doc to indicate that tracing can be enabled (linked to the document in docs.gitlab.com) and inspected after the evaluation.
Further details
Links / references
Edited by Bruno Cardoso