Skip to content

Collect and track GitLab Duo Chat evaluations

What does this MR do and why?

Related to #429642 (closed)

  • Adds a new mode to scripts/duo_chat/reporter.rb ("the reporter script").
  • Adds a spec for the reporter script.
  • Updates the documentation to outline how the GitLab Duo Chat QA evaluation test works.

The reporter script is a script meant to be run by the CI job rspec-ee unit gitlab-duo-chat-qa pg14 to process the CI artifacts and generate a Markdown report. The script also uploads its outputs to GitLab as a snippet/issue or MR note.

When the CI job rspec-ee unit gitlab-duo-chat-qa pg14 runs in the pipeline for a merge request, the following happens:

  1. rspec-ee unit gitlab-duo-chat-qa pg14 (Ex. https://gitlab.com/gitlab-org/gitlab/-/jobs/5529680119) runs ee/spec/lib/gitlab/llm/chain/agents/zero_shot/qa_evaluation_spec.rb.

  2. ee/spec/lib/gitlab/llm/chain/agents/zero_shot/qa_evaluation_spec.rb saves the result of its run as CI artifacts https://gitlab.com/gitlab-org/gitlab/-/jobs/5529680119/artifacts/browse/tmp/duo_chat/.

  3. scripts/duo_chat/reporter.rb is run. The script processes the artifacts to generate a Markdown report then posts the report as a note to the MR. Check out this MR's note: !136799 (comment 1647272746)

This MR updates the reporter script so that when it runs in the pipeline for master branch, it

  1. uploads the artifacts as snippets, (ex. https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/snippets/3621083),

  2. posts the Markdown report as an issue (ex. https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/9)

and 3) updates the tracker issue https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/1 with the information extracted from the Markdown report.

How to test the change

We can confirm that the existing functionality of the script continues to work by checking out the note !136799 (comment 1647272746).

To test that the script can successfully collect and track the evaluations when run on master branch's pipeline, follow these steps:

  1. Create a new project access token https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/settings/access_tokens.

    Be sure to use a short expiry or delete the access token after you're done testing.

  2. Download the CI artifacts (qa_*.json files) https://gitlab.com/gitlab-org/gitlab/-/jobs/5529680119/artifacts/browse/tmp/duo_chat/.

Place them under your gdk's gitlab project root directory like this:

gitlab-development-environment/gitlab/tmp/duo_chat
├── qa_1699866498.json
└── qa_1699866646.json
  1. Set these environment variables
CI_PIPELINE_URL="https://gitlab.com/gitlab-org/gitlab/-/pipelines/17983572039847129234" # The value does not matter.
CI_COMMIT_SHA="foobar123" # The value does not matter
CHAT_QA_EVALUATION_PROJECT_TOKEN_FOR_CI_SCRIPTS_API_USAGE="<access token>"

# Important! The reporter script knows its running in a `master` branch's pipeline by comparing these env. vars.
CI_COMMIT_BRANCH="master"
CI_DEFAULT_BRANCH="master"
  1. Run the script:
./scripts/duo_chat/reporter.rb
  1. Check out https://gitlab.com/gitlab-org/ai-powered/ai-framework/qa-evaluation/-/issues/1 and confirm there's a new entry.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by euko

Merge request reports