Investigation of chat code generation often responding code in incomplete Javascript even if the user asks for Python Function , while Claude 2 still retruns a structured complete python function
🔦
Objective The Objective is to experiment with Duo Chat to reach par with the foundational model Claude in terms of quality, as measured by the similarity score. Below is the screenshot from the current dashboard overview, that demonstrates where Duo Chat lies in respect to Claude.
#⃣
Primary Metric for Success The primary metric for success in this iteration of experimentation is the Comparison Similarity Score. This score specifically compares the output generated by the Answering Model (Duo Chat) and the Comparison Model (Claude).
📚
Dataset for Diagnostic Testing/Experimentation For this dataset for experiments this is a small subset of data that can be found here as a subset of the dashboard : https://docs.google.com/spreadsheets/d/1zuDH8BGVPb5rtHOogmiAqH3kgEXYD8h3UfHVMfx_SwE/edit#gid=0 which is a subset of 45 Code Generation Question
We have created 3 videos to walkthrough the idea behind this change and an example how to incorporate it to Duo Chat development
- How to run end to end experiments as an example of trimmed prompts (https://www.youtube.com/watch?v=H2oykA5THac)
- Curious on how the datasets are build and run the experiments (https://www.youtube.com/watch?v=swN2EtAzdWA)
- A generic walkthrough of end to end pipeline to get familiar with accessing Prompt Library( https://youtu.be/U2CW95yylMs)
🔍
Metrics - Control Metric Score: Comparison Similarity: (avg similarity score) 0.57
- Experiment Metric Score: TBD post Experiment
- Variance:
📶 : TBD Post Experiment
✍🏼
Experiment Details Recommendation: consider investigating a pattern in the input/response where Chat Duo appears to 'think' that is it JavaScript and provides incomplete while Claude-2 gives structured meaningful Python functions for basic Code Genaration Question. One example from the spreadsheet is below
chat task type |
Question | Answering Model | Answer | Comparison Model |
Answer_comparison _Model |
Similarity_Score |
---|---|---|---|---|---|---|
code_generation_non_gitlab | Write a function to find minimum of three numbers. | duo_chat |
```js function findMin(num1, num2, num3) { return Math.min(num1, num2, num3); } ``` |
Claude |
Since there is no context provided, here is a function to find the minimum of three numbers: ```python def findMin(num1, num2, num3): if (num1 < num2) and (num1 < num3): return num1 elif (num2 < num1) and (num2 < num3): return num2 else: return num3 ``` This function takes in three numbers as arguments, compares them using if-elif-else statements, and returns the smallest number. |
0.8 |