Iteration 3: Experiment with different prompts and compare how they perform on user satisfaction with response (#408649) · Issues · GitLab.org / GitLab

Iteration 3: Experiment with different prompts and compare how they perform on user satisfaction with response

<details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=408649) </details>  # Goal of this issue 1. Before releasing Explain Code as GA, we should achieve that user feedback is in at least x% of cases `helpful` and in less then y% of cases `wrong`. The values for x and y remain to be determined. 1. We are likely going to switch to a different AI vendor. We should compare how the initial vendor compare to the new vendor to inform the business decision regarding consequences on user's satisfaction when switching vendors. # Proposal ## Enhance the metrics for collecting user satisfaction with: * `helpful`, `unhelpful`, `wrong` (already available as a result of https://gitlab.com/gitlab-org/gitlab/-/issues/404272+) * number of lines (or characters or tokens) selected for explanation - this will help us understand if the satisfaction is a function of the length of code selected * number of characters (or tokens) of the answer - this will help us understand if the satisfaction is a function of the length of the answer * language of the code selected - this will help us understand if the satisfaction is a function of the code language * the prompt used and wether the selected code was before the prompt or after the prompt - this will help us understand how different prompt designs perform - **do not collect the code itself or the answer from the AI** to prevent collecting customer or user data. * allow users to add a text message to explain their sentiment about the response or the feature as such. * count the total number of times that - users have received an AI answer vs. - the times they also choose to provide feedback - the times they asked a follow-up question - and did not give feedback - did give feedback ## Play with different prompts * Use guidance like https://www.promptingguide.ai/ to engineer a hand full of prompts. * Randomly use the different prompts and different providers. * Present the results in Sisense. - we intend to keep measuring user satisfaction also beyond GA, to be able to adjust prompts when needed * Use the best performing response going forward.

issue