Investigation on Duo Chat regression for Issue-epic
Summary
In our daily monitoring of the Duo Chat Dashboard for April 1st, we observed a drop in the similarity score from 0.84 to 0.81
Details can be found in the dashboard: https://lookerstudio.google.com/reporting/151b233a-d6ad-413a-9ebf-ea6efbf5387b.
We have more answers with score 1 for issue/epic 16.85%.
Findings
groupai model validation investigated with a cursory check https://docs.google.com/spreadsheets/d/12w9kx8bwp6EnefDVUo9ga5PmCxWYInlcatkiCthLWEA/edit#gid=700578206
More instances of I don't see how I can help. Please give better instructions! answers on the 2024-04-01 run (38 vs 16) which causes the degradation of similarity and LLM judge score
Steps to reproduce
As of 2024-04-02, I can't reproduce this issue in production any more.
curl --request POST \
--url https://gitlab.com/api/v4/chat/completions \
--header "Authorization: Bearer $GL_PAT" \
--header "Content-Type: application/json" \
--data '{
"content": "Summarize this Issue.",
"resource_type": "issue",
"resource_id": 113414743,
"with_clean_history": true
}'
This issue summarizes research done to validate the problem around offering metrics on Source Lines of Code (SLoC) per developer or per repository/group in GitLab. Through surveys, interviews, and literature review, the research found that SLoC per developer is not a good metric and should not be implemented. However, SLoC per directory/repo shows some value as a nice-to-have feature for individual contributors. Further research is recommended on SLoC per group to understand the use case from administrators. Overall, the problem validation resulted in a recommendation to not implement SLoC per developer due to potential negative consequences, but identified some interest in SLoC per location to help understand repositories.
Next Steps
We suspect the regression is due to recent changes in GitLab and would recommend the groupduo chat to further look into any MR's merged within the time frame
