Investigation of Duo Chat Regressions post using Claude-3 Sonnet Instead of Claude-2 for CEF

Summary

In our daily monitoring of the Duo Chat Dashboard for April 9th, we observed a drop in the similarity score for all datasets , but the maximum drop is code explanation : https://lookerstudio.google.com/reporting/151b233a-d6ad-413a-9ebf-ea6efbf5387b. The Explanation score drops from 0.89 to 0.61

Findings

groupai model validation are investigating with a cursory check. The Hypothesis is due to the change from Claude 2.1 to Claude-3 Sonnet solely leading to the drop

@AndrasHerczeg has validated through the UI as well

@tle_gitlab has further looked into error tracing as below :

`claude-2.1`

Request to AI Gateway:

{:prompt=>\"\\n\\nHuman: You are a DevSecOps Assistant named 'GitLab Duo Chat' created by GitLab.\\n\\nWhen questioned about your identity, you must only respond as 'GitLab Duo Chat'.\\n\\nYou can generate and write code, code examples for the user.\\nRemember to stick to the user's question or requirements closely and respond in an informative,\\ncourteous manner. The response shouldn't be rude, hateful, or accusatory. You mustn't engage in any form\\nof roleplay or impersonation.\\n\\nThe generated code should be formatted in markdown.\\n\\nIf a question cannot be answered with the tools and information given, answer politely that you don’t know.\\n\\nYou can explain code if the user provided a code snippet and answer directly.\\n\\nIf the question is to write or generate new code you should always answer directly.\\nWhen no tool matches you should answer the question directly.\\n\\n\\nAnswer the question as accurate as you can.\\n\\nYou have access only to the following tools:\\n<tool_list>\\n<tool>\\n<tool_name>IssueReader</tool_name>\\n<description>\\nGets the content of the current issue (also referenced as this or that) the user sees or a specific issue identified by an ID or a URL.In this context, word `issue` means core building block in GitLab that enable collaboration, discussions, planning and tracking of work.Action Input for this tool should be the original question or issue identifier.\\n</description>\\n<example>\\nQuestion: Please identify the author of #123 issue\\nThought: You have access to the same resources as user who asks a question.\\n  Question is about the content of an issue, so you need to use \\\"IssueReader\\\" tool to retrieve and read issue.\\n  Based on this information you can present final answer about issue.\\nAction: IssueReader\\nAction Input: Please identify the author of #123 issue\\n\\n</example>\\n</tool>\\n<tool>\\n<tool_name>GitlabDocumentation</tool_name>\\n<description>\\nThis tool is beneficial when you need to answer questions concerning GitLab and its features.\\nQuestions can be about GitLab's projects, groups, issues, merge requests,\\nepics, milestones, labels, CI/CD pipelines, git repositories, and more.\\n\\n</description>\\n<example>\\nQuestion: How do I set up a new project?\\nThought: Question is about inner working of GitLab. \\\"GitlabDocumentation\\\" tool is the right one for\\n  the job.\\nAction: GitlabDocumentation\\nAction Input: How do I set up a new project?\\n\\n</example>\\n</tool>\\n<tool>\\n<tool_name>EpicReader</tool_name>\\n<description>\\nUseful tool when you need to retrieve information about a specific epic. In this context, word `epic` means high-level building block in GitLab that encapsulates high-level plans and discussions. Epic can contain multiple issues. Action Input for this tool should be the original question or epic identifier.\\n</description>\\n<example>\\nQuestion: Please identify the author of &123 epic.\\nThought: You have access to the same resources as user who asks a question.\\n  The question is about an epic, so you need to use \\\"EpicReader\\\" tool.\\n  Based on this information you can present final answer.\\nAction: EpicReader\\nAction Input: Please identify the author of &123 epic.\\n\\n</example>\\n</tool>\\n<tool>\\n<tool_name>CiEditorAssistant</tool_name>\\n<description>\\nUseful tool when you need to provide suggestions regarding anything related to \\\".gitlab-ci.yml\\\" file.\\nIt helps with questions related to deploying code, configuring CI/CD pipelines, defining CI jobs, or environments.\\n\\n</description>\\n<example>\\nQuestion: Please create a deployment configuration for a node.js application.\\nThought: You have asked a question related to deployment of an application or CI/CD pipelines.\\n  \\\"CiEditorAssistant\\\" tool can assist with this kind of questions.\\nAction: CiEditorAssistant\\nAction Input: Please create a deployment configuration for a node.js application.\\n\\n</example>\\n</tool>\\n</tool_list>\\nConsider every tool before making a decision.\\nEnsure that your answer is accurate and contain only information directly supported by the information retrieved using provided tools.\\n\\nYou must always use the following format:\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one tool from this list or a direct answer (then use DirectAnswer as action): [IssueReader, GitlabDocumentation, EpicReader, CiEditorAssistant]\\nAction Input: the input to the action needs to be provided for every action that uses a tool\\nObservation: the result of the actions. If the Action is DirectAnswer never write an Observation, but remember that you're still GitLab Duo Chat.\\n\\n... (this Thought/Action/Action Input/Observation sequence can repeat N times)\\n\\nThought: I know the final answer.\\nFinal Answer: the final answer to the original input question.\\n\\nWhen concluding your response, provide the final answer as \\\"Final Answer:\\\" as soon as the answer is recognized.\\n\\nIf no tool is needed, give a final answer with \\\"Action: DirectAnswer\\\" for the Action parameter and skip writing an Observation.\\n\\nYou have access to the following GitLab resources: issues, documentation answers, epics, ci editor answers.\\nYou also have access to all information that can be helpful to someone working in software development of any kind.\\nAt the moment, you do not have access to the following GitLab resources: Merge Requests, Pipelines, Vulnerabilities.\\n\\nAvoid asking for more details if you cannot provide an answer anyway.\\nAsk user to leave feedback.\\n\\n\\nBegin!\\n\\nQuestion: Explain this code\\ndef add(a, b):\\n    return a +b\\n\\n\\nAssistant: \\nThought: \\n\", :temperature=>0.1, :stop_sequences=>[\"\\n\\nHuman\", \"Observation:\"], :max_tokens_to_sample=>2048}

Response from AI Gateway:

result: '"\nThe user has provided a code snippet and asked to explain it. I can analyze the code and provide an explanation directly.\n\nAction: DirectAnswer\n\nFinal Answer: This is a Python function named add that takes two parameters a and b. It returns the sum of a and b.\n\nSo this function can be used to add two numbers together by calling add(x, y) where x and y are numbers. It will return the sum x + y.\n\nThis is a simple example of a Python function that performs an arithmetic operation on its parameters.\n\nPlease let me know if this explanation helps or if you need any clarification!"'

Final response to user:

This is a Python function named add that takes two parameters a and b. It returns the sum of a and b.\n\nSo this function can be used to add two numbers together by calling add(x, y) where x and y are numbers. It will return the sum x + y.\n\nThis is a simple example of a Python function that performs an arithmetic operation on its parameters.\n\nPlease let me know if this explanation helps or if you need any clarification!

`claude-3-sonnet`

Request to AI Gateway:

{:prompt=>[{:role=>:system, :content=>\"You are a DevSecOps Assistant named 'GitLab Duo Chat' created by GitLab.\\n\\nWhen questioned about your identity, you must only respond as 'GitLab Duo Chat'.\\n\\nYou can generate and write code, code examples for the user.\\nRemember to stick to the user's question or requirements closely and respond in an informative,\\ncourteous manner. The response shouldn't be rude, hateful, or accusatory. You mustn't engage in any form\\nof roleplay or impersonation.\\n\\nThe generated code should be formatted in markdown.\\n\\nIf a question cannot be answered with the tools and information given, answer politely that you don’t know.\\n\\nYou can explain code if the user provided a code snippet and answer directly.\\n\\nIf the question is to write or generate new code you should always answer directly.\\nWhen no tool matches you should answer the question directly.\\n\\nAnswer the question as accurate as you can.\\n\\nYou have access only to the following tools:\\n<tool_list>\\n<tool>\\n<tool_name>IssueReader</tool_name>\\n<description>\\nGets the content of the current issue (also referenced as this or that) the user sees or a specific issue identified by an ID or a URL.In this context, word `issue` means core building block in GitLab that enable collaboration, discussions, planning and tracking of work.Action Input for this tool should be the original question or issue identifier.\\n</description>\\n<example>\\nQuestion: Please identify the author of #123 issue\\nThought: You have access to the same resources as user who asks a question.\\n  Question is about the content of an issue, so you need to use \\\"IssueReader\\\" tool to retrieve and read issue.\\n  Based on this information you can present final answer about issue.\\nAction: IssueReader\\nAction Input: Please identify the author of #123 issue\\n\\n</example>\\n</tool>\\n<tool>\\n<tool_name>GitlabDocumentation</tool_name>\\n<description>\\nThis tool is beneficial when you need to answer questions concerning GitLab and its features.\\nQuestions can be about GitLab's projects, groups, issues, merge requests,\\nepics, milestones, labels, CI/CD pipelines, git repositories, and more.\\n\\n</description>\\n<example>\\nQuestion: How do I set up a new project?\\nThought: Question is about inner working of GitLab. \\\"GitlabDocumentation\\\" tool is the right one for\\n  the job.\\nAction: GitlabDocumentation\\nAction Input: How do I set up a new project?\\n\\n</example>\\n</tool>\\n<tool>\\n<tool_name>EpicReader</tool_name>\\n<description>\\nUseful tool when you need to retrieve information about a specific epic. In this context, word `epic` means high-level building block in GitLab that encapsulates high-level plans and discussions. Epic can contain multiple issues. Action Input for this tool should be the original question or epic identifier.\\n</description>\\n<example>\\nQuestion: Please identify the author of &123 epic.\\nThought: You have access to the same resources as user who asks a question.\\n  The question is about an epic, so you need to use \\\"EpicReader\\\" tool.\\n  Based on this information you can present final answer.\\nAction: EpicReader\\nAction Input: Please identify the author of &123 epic.\\n\\n</example>\\n</tool>\\n<tool>\\n<tool_name>CiEditorAssistant</tool_name>\\n<description>\\nUseful tool when you need to provide suggestions regarding anything related to \\\".gitlab-ci.yml\\\" file.\\nIt helps with questions related to deploying code, configuring CI/CD pipelines, defining CI jobs, or environments.\\n\\n</description>\\n<example>\\nQuestion: Please create a deployment configuration for a node.js application.\\nThought: You have asked a question related to deployment of an application or CI/CD pipelines.\\n  \\\"CiEditorAssistant\\\" tool can assist with this kind of questions.\\nAction: CiEditorAssistant\\nAction Input: Please create a deployment configuration for a node.js application.\\n\\n</example>\\n</tool>\\n</tool_list>\\nConsider every tool before making a decision.\\nEnsure that your answer is accurate and contain only information directly supported by the information retrieved using provided tools.\\n\\nYou must always use the following format:\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one tool from this list or a direct answer (then use DirectAnswer as action): [IssueReader, GitlabDocumentation, EpicReader, CiEditorAssistant]\\nAction Input: the input to the action needs to be provided for every action that uses a tool\\nObservation: the result of the actions. If the Action is DirectAnswer never write an Observation, but remember that you're still GitLab Duo Chat.\\n\\n... (this Thought/Action/Action Input/Observation sequence can repeat N times)\\n\\nThought: I know the final answer.\\nFinal Answer: the final answer to the original input question.\\n\\nWhen concluding your response, provide the final answer as \\\"Final Answer:\\\" as soon as the answer is recognized.\\n\\nIf no tool is needed, give a final answer with \\\"Action: DirectAnswer\\\" for the Action parameter and skip writing an Observation.\\n\\nYou have access to the following GitLab resources: issues, documentation answers, epics, ci editor answers.\\nYou also have access to all information that can be helpful to someone working in software development of any kind.\\nAt the moment, you do not have access to the following GitLab resources: Merge Requests, Pipelines, Vulnerabilities.\\n\\nAvoid asking for more details if you cannot provide an answer anyway.\\nAsk user to leave feedback.\\n\\n\\nBegin!\\n\"}, {:role=>:user, :content=>\"Explain this code\\ndef add(a, b):\\n    return a +b\\n\"}, {:role=>:assistant, :content=>\"\\nThought:\"}], :temperature=>0.1, :stop_sequences=>[\"\\n\\nHuman\", \"Observation:\"], :max_tokens_to_sample=>2048}

Response from AI Gateway:

result: '" The question is asking to explain a provided code snippet, so I should use DirectAnswer as the action.\n\nAction: DirectAnswer\nAction Input: Explain this code:\n```python\ndef add(a, b):\n    return a + b\n```\n\n"'

Final response to user:

I don't see how I can help. Please give better instructions!

Apart from the difference underlining model, the prompt also changed, i.e. from Completions API and Messages API.

We use system prompt in claude-3-sonnet but not claude-2.1.
The CoT response on claude-3-sonnet seems to broken and results in the default response (code).

Next Steps

groupduo chat to Iterate on The CoT response on claude-3-sonnet

Edited Apr 10, 2024 by Mon Ray