Add MergeRequestReader to chat
What does this MR do and why?
Addresses one popular context the chat currently does not support: https://gitlab.com/gitlab-org/ai-powered/duo-chat/discussions/-/issues/3+
Specifically this MR addresses Support Merge Requests as context for Duo Chat ... (#464587 - closed) • Lesley Razzaghian • 17.5 • Needs attention
Evaluation results
Here are the evaluation results from a collective LLM judge on the master branch
Here are the results on this branch
Here are the stats (averages)
Master branch
Correctness: 3.65
Readability: 3.75
Comprehensiveness: 3.43
This branch
Correctness: 3.71
Readability: 3.75
Comprehensiveness: 3.53
The improved results in the existing evaluation are likely not statistically significant, but at least it proves these changes do not degrade existing questions.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- In rails console enable the experiment fully
Feature.enable(:ai_merge_request_reader_for_chat)
- Visit merge request and ask question, for example: summarize this Merge request
What needs to be done
Task | Status | Notes |
---|---|---|
Run CEF locally on this branch vs master to ensure no degradation | See above in 'Evaluation results' | |
Create basic dataset for MR eval | Here | |
Change Chat REST API to accept MR requests | ||
Add seed data for merge requests to be able to test locally in CEF | I believe this can be a followup | |
Make this work with 'v2_chat_agent_integration' feature flag on | This MR just needs to be merged after this one has been rolled out | |
Ask model validation team to add this dataset to daily runs, merge this MR and monitor eval results | Asked them to add it discussion |