Customer MVC: natural language querying in visualization designer
## Problem to be solved ### User problem Creating the SQL needed to create a visualization is difficult so I don't create custom visualizations or dashboards for Product Analytics. ### Solution hypothesis If GitLab provides an interface for users (Product/Engineering Managers) to ask questions about Product Analytics data, then they will create more custom visualizations and same them to custom dashboards. ### Assumption _What assumptions are you making about this problem and the solution?_ * We assume customers have instrumented an app but are not getting value from the default dashboard. * We assume customers have specific questions they can ask about, like "How many visitors did we have this week compared to last week" * We assume customers are more comfortable using plain English rather than writing a YML file to explore their data ### Personas _What _[_personas_](https://about.gitlab.com/handbook/product/personas/#list-of-user-personas)_ have this problem, who is the intended user?_ * [Parker, Product Manager](https://handbook.gitlab.com/handbook/product/personas/#parker-product-manager) * [Delaney, Development Team Lead](https://handbook.gitlab.com/handbook/product/personas/#delaney-development-team-lead) * [Sasha, Software Developer](https://handbook.gitlab.com/handbook/product/personas/#sasha-software-developer) ## Proposal Product Analytics-specific implementation of natural language querying. ### Success _How will you measure whether this experiment is a success?_ We can measure how many custom visualizations are created after launch of the experiment compared to before / onboarded project. We can also measure engagement with the natural language querying feature itself. That instrumentation is tracked in https://gitlab.com/gitlab-org/gitlab/-/issues/442052+ ## Possible Solutions - Using langchain ([json agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/json.html)) for an implementation like https://www.vizgpt.ai/ - Utilizing a chat interface like how [Cube does with Delphi and Slack](https://cube.dev/blog/conversational-interface-for-semantic-layer) Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/393881+ ## Feature release ### Main Job story _What job to be done will this solve?_ As a user, I want to explore my data and create a custom visualization of it using plain words. ## Proposal updates/additions ### Problem validation _What validation exists that customers have this problem?_ When we [interviewed users](https://www.youtube.com/playlist?list=PL05JrBw4t0KogNgPHowkNaXy5z1dEmZaj), one of them described how their PM and marketing teams would come up with different visualization requests and put them in a spreadsheet. The development teams would then have to build them from the plain text descriptions provided in the spreadsheet. This is a signal that there is a need for this type of capability, as PMs and marketing teams in this case have already defined the plain text descriptions of what they want, but do not currently have a way to create the code to actually do the visualization. ### Business objective _What business objective will be achieved with this proposal?_ Our business goal is to increase usage of Product Analytics so that customers get additional value from the offering, which will lead to account expansion and lower churn rates. Making it easier for users to get to insights and visualize the data in the way they desire gives them additional value from Product Analytics, which advances those goals. ### Confidence _Has this proposal been derived from research?_ | Confidence | Research | |------------|----------| | Medium | https://www.youtube.com/playlist?list=PL05JrBw4t0KogNgPHowkNaXy5z1dEmZaj | ### Requirements _What tasks or actions should the user be capable of performing with this feature?_ > :warning:️ Related feature and research issues should be linked in the related issues section (Delete this line when this is done) #### The user needs to be able to: - Have a way to enter a plain text description of the type of data they want to be displayed in a visualization - Have a plain text way to specify any specific filtering, exclusion, grouping, etc to be applied to the data - Have a way to see the results of their query and the data that it returns # Checklist ## Experiment <details> <summary>Issue information</summary> - [x] Add information to the issue body about: - [x] The user problem being solved - [x] Why the solution hypothesis solves this problem - [x] Your assumptions have been defined - [x] Who it's for, list of personas impacted - [x] Your proposal has been defined - [x] Your success metrics have been defined - [x] UX maturity requirements have been measured - [x] Add relevant designs to the Design Management area of the issue if available - [x] Confirm that an unexpected outage of this feature will not negatively impact the application or other features - [x] Add a feature flag so that this feature can be quickly disabled if/when needed - [x] If this experiment introduces a new service or data store, ensure it is not processing or storing [red data](https://about.gitlab.com/handbook/security/data-classification-standard.html#data-classification-levels) without a security and if needed legal review - _NOTE_: We recommend using one of the already adopted models or data stores. If you need to use something else, be aware that using other models or data stores will require additional review during the feature stage for operational fitness and compliance. - [ ] Completed the necessary steps to move from Experiment to Beta - [x] Ensure this issue has the ~wg-ai-integration label to ensure visibility to various teams working on this - [ ] Add the feature to https://gitlab.com/gitlab-org/gitlab/-/blob/master/doc/user/ai_features.md once it is ready to be released to customers as an Experiment </details> ## Beta <details> <summary>Issue information</summary> - [x] Add information to the issue body about: - [ ] The Main Job story and Small Jobs it's expected to satisfy have been stated - [ ] Your assumptions have been defined - [ ] Proposal has been updated as necessary - [ ] Problem validation inforamtion has been added - [ ] Business objective has been defined - [ ] Requirements have been defined - [ ] Success metrics have been defined - [ ] UX maturity requirements have been measured - [ ] Add all related feature issues to the Linked items section - [ ] Add all relevant solution validation issues to the Linked items section that shows this proposal will solve the customer problem, or details explaining why it's not possible to provide that validation. - [ ] Add relevant designs to the Design Management area of the issue. - [ ] You have adhered to our [Definition of Done](https://docs.gitlab.com/ee/development/contributing/merge_request_workflow.html#definition-of-done) standards - [ ] Completed the necessary steps to move from Beta to GA </details> ## Generally available <details> <summary>Issue information</summary> - [ ] Add information to the issue body about: - [ ] Your assumptions have been defined - [ ] Your proposal has been defined - [ ] Problem validation inforamtion has been added - [ ] Business objective has been defined - [ ] Confidence about this feature has been assessed and defined - [ ] Requirements have been defined - [ ] Add all relevant solution validation issues to the Linked items section that shows this proposal will solve the customer problem, or details explaining why it's not possible to provide that validation. - [ ] Add relevant designs to the Design Management area of the issue. - [ ] You have adhered to our [Definition of Done](https://docs.gitlab.com/ee/development/contributing/merge_request_workflow.html#definition-of-done) standards - [ ] Ensure this issue has the ~wg-ai-integration label to ensure visibility to various teams working on this </details> <details> <summary>Technical needs</summary> - [ ] Please consider the operational aspects of the feature you are creating. A list of things to think about is in: https://gitlab.com/gitlab-org/gitlab/-/issues/403859. We will be improving this process in the future: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/117637#note_1353253349. - [ ] @ mention your [AppSec Stable Counterpart](https://about.gitlab.com/handbook/product/categories/) and read the [AI secure coding guidelines](https://docs.gitlab.com/ee/development/secure_coding_guidelines.html#artificial-intelligence-ai-features) 1. Work estimate and skills needs to build an ML viable feature: To build any ML feature depending on the work, there are many personas that contribute including Data Scientist, NLP engineer, ML Engineer, MLOps Engineer, ML Infra engineers, Fullstack engineer to integrate the ML Services with Gitlab. Post-prototype we would assess the skills needed to build a production-grade ML feature for the prototype. 2. Data Limitation: We would like to upfront validate if we have viable data for the feature including whether we can use the DataOps pipeline of ModelOps or create a custom one. We would want to understand the training data, test data, and feedback data to dial up the accuracy and the limitations of the data. 3. Model Limitation: We would want to understand if we can use an open-source pre-trained model, tune and customize it or start a model from scratch as well. Further, we would assess based on the ModelOps model evaluation framework which would be the right model to use based on the use case. 4. Cost, Scalability, Reliability: We would want to estimate the cost of hosting, serving, inference of the model, and the full end-to-end infrastructure including monitoring and observability. 5. Legal and Ethical Framework: We would want to align with legal and ethical framework like any other ModelOps features to cover the nine principles of responsible ML and any legal support needed. </details> <details> <summary>Dependency needs</summary> - [ ] Please consider the operational aspects of the service you are creating. A list of things to think about is in: https://gitlab.com/gitlab-org/gitlab/-/issues/403859. We will be improving this process in the future: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/117637#note_1353253349. </details> <details> <summary>Legal needs</summary> - [ ] TBD </details> ## Additional resources - If you'd like help with technical validation, or would like to discuss UX considerations for AI mention the AI Assisted group using `@gitlab-org/modelops/applied-ml`. - Read about our [AI Integration strategy](https://internal-handbook.gitlab.io/handbook/product/ai-strategy/ai-integration-effort/) - [AI-human interaction guidelines](https://design.gitlab.com/usability/ai-human-interaction) - [Highlighting feature versions guidelines](https://design.gitlab.com/usability/feature-management#highlighting-feature-versions) - [UX maturity requirements](https://about.gitlab.com/handbook/product/ai/ux-maturity/) - **Slack channels** - `#wg_ai_integration` - Slack channel for the working group and the high-level alignment on getting AI ready for Production (Development, Product, UX, Legal, etc.) But from the other channels feel free to reach out and post progress here - `#ai_integration_dev_lobby` - Channel for all implementation-related topics and discussions of actual AI features (e.g. explain the code) - `#ai_enablement_team` - Channel for the AI Enablement Team which is building the base for all features (experimentation API, Abstraction Layer, Embeddings, etc.) # Working sections Working sections for defining experiment, beta, and GA content are below. These sections are used to generate the content in the issue body above. <details> <summary>Experiment section</summary> # [Experiment](https://docs.gitlab.com/ee/policy/experiment-beta-support.html#experiment) This section should be completed prior to work on the Experiment beginning. ## Problem to be solved ### User problem Creating the SQL needed to create a visualization is difficult so I don't create custom visualizations or dashboards for Product Analytics. ### Solution hypothesis If GitLab provides an interface for users (Product/Engineering Managers) to ask questions about Product Analytics data, then they will create more custom visualizations and same them to custom dashboards. ### Assumption _What assumptions are you making about this problem and the solution?_ * We assume customers have instrumented an app but are not getting value from the default dashboard. * We assume customers have specific questions they can ask about, like "How many visitors did we have this week compared to last week" * We assume customers are more comfortable using plain English rather than writing a YML file to explore their data ### Personas _What _[_personas_](https://about.gitlab.com/handbook/product/personas/#list-of-user-personas)_ have this problem, who is the intended user?_ * [Parker, Product Manager](https://handbook.gitlab.com/handbook/product/personas/#parker-product-manager) * [Delaney, Development Team Lead](https://handbook.gitlab.com/handbook/product/personas/#delaney-development-team-lead) * [Sasha, Software Developer](https://handbook.gitlab.com/handbook/product/personas/#sasha-software-developer) ## Proposal Product Analytics-specific implementation of natural language querying. ### Success _How will you measure whether this experiment is a success?_ We can measure how many custom visualizations are created after launch of the experiment compared to before / onboarded project. We can also measure engagement with the natural language querying feature itself. That instrumentation is tracked in https://gitlab.com/gitlab-org/gitlab/-/issues/442052+ ## Possible Solutions - Using langchain ([json agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/json.html)) for an implementation like https://www.vizgpt.ai/ - Utilizing a chat interface like how [Cube does with Delphi and Slack](https://cube.dev/blog/conversational-interface-for-semantic-layer) Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/393881+ ## Feature release ### Main Job story _What job to be done will this solve?_ As a user, I want to create a custom visualization of my data using plain words instead of code. ## Proposal updates/additions ### Problem validation _What validation exists that customers have this problem?_ When we [interviewed users](https://www.youtube.com/playlist?list=PL05JrBw4t0KogNgPHowkNaXy5z1dEmZaj), one of them described how their PM and marketing teams would come up with different visualization requests and put them in a spreadsheet. The development teams would then have to build them from the plain text descriptions provided in the spreadsheet. This is a signal that there is a need for this type of capability, as PMs and marketing teams in this case have already defined the plain text descriptions of what they want, but do not currently have a way to create the code to actually do the visualization. ### Business objective _What business objective will be achieved with this proposal?_ Our business goal is to increase usage of Product Analytics so that customers get additional value from the offering, which will lead to account expansion and lower churn rates. Making it easier for users to get to insights and visualize the data in the way they desire gives them additional value from Product Analytics, which advances those goals. ### Confidence _Has this proposal been derived from research?_ | Confidence | Research | |------------|----------| | Medium | https://www.youtube.com/playlist?list=PL05JrBw4t0KogNgPHowkNaXy5z1dEmZaj | ### Requirements _What tasks or actions should the user be capable of performing with this feature?_ > :warning:️ Related feature and research issues should be linked in the related issues section (Delete this line when this is done) #### The user needs to be able to: - Have a way to enter a plain text description of the type of data they want to be displayed in a visualization - Have a plain text way to specify any specific filtering, exclusion, grouping, etc to be applied to the data - Have a way to see the results of their query and the data that it returns </details> <details> <summary>Beta section</summary> # [Beta](https://docs.gitlab.com/ee/policy/alpha-beta-support.html#beta) _This section should be completed prior to beginning work on the Beta experience._ ### [Main Job story](https://about.gitlab.com/handbook/product/ux/jobs-to-be-done/#how-to-write-a-jtbd) _What job to be done will this solve?_ ##### [Small Jobs](https://about.gitlab.com/handbook/product/ux/jobs-to-be-done/#small-jobs) _What are the small jobs this feature is solving for?_ ### Assumption _What assumptions are you making about this problem and the solution?_ ### Proposal updates/additions ### Problem validation _What validation exists that customers have this problem?_ ### Business objective _What business objective will be achieved with this proposal?_ ### Requirements _What tasks or actions should the user be capable of performing with this feature?_ ### The user needs to be able to: - ... - ... #### Success _How will you measure whether this Beta is a success?_ **UX maturity requirements** [_Beta to GA_](https://about.gitlab.com/handbook/product/ai/ux-maturity/#criteria-and-requirements) | Criteria | Minimum Requirement | Assessment for GA | |----------|---------------------|-------------------| | [Problem validation](https://about.gitlab.com/handbook/product/ai/ux-maturity/#validation-problem-validation)<br>How well do we understand the problem? | [Mix of evidence and assumptions](https://about.gitlab.com/handbook/product/ai/ux-maturity/#questions-to-ask) | | | [Solution validation](https://about.gitlab.com/handbook/product/ai/ux-maturity/#validation-solution-validation)<br>How usable is the solution? | [Usability testing](https://about.gitlab.com/handbook/product/ux/ux-scorecards/#option-b-perform-a-formative-evaluation) and [Heuristic evaluation](https://about.gitlab.com/handbook/product/ux/ux-scorecards/#option-a-conduct-a-heuristic-evaluation), Avg. task pass rate \>80%, Grade B | | | [Improve](https://about.gitlab.com/handbook/product/ai/ux-maturity/#build-improve)<br>How successful is the solution? | Quality goals set by the team are reached. | | | [Design standards](https://about.gitlab.com/handbook/product/ai/ux-maturity/#design-standards) adherence<br>How compliant is the solution with our design standards? | Should adhere to ([Pajamas](https://design.gitlab.com/), [checklist](https://docs.gitlab.com/ee/development/contributing/design.html#checklist)) | | </details> <details> <summary>Generally Available Section</summary> # [Generally Available](https://docs.gitlab.com/ee/policy/alpha-beta-support.html#generally-available-ga) ### Assumption _What assumptions are you making about this problem and the solution?_ ### Proposal updates/additions ### Problem validation _What validation exists that customers have this problem?_ ### Requirements _What tasks or actions should the user be capable of performing with this feature?_ > :warning:️ Related feature and research issues should be linked in the related issues section (Delete this line when this is done) #### The user needs to be able to: - ... - ... </details> # Implementation Notes - Prompt Engineering: * Limited success asking LLM to output cube.js queries, tends to get confused and include SQL inside measures unnecessarily * More success with asking the LLM to give a list of measures, dimensions and filters that satisfy the users question from a given list of available data in our schema, then fit that into our cube.js query * Anthropic does well returning XML * try YAML? * Use the `gitlab:llm:zero_shot:test:questions` rake task for evaluating prompts * See [ChainOfThoughtParser](https://gitlab.com/gitlab-org/gitlab/-/blob/79ca83638d54309b09d785829eb7910ffbb9b069/ee/lib/gitlab/llm/chain/parsers/chain_of_thought_parser.rb#L7-7) for example of parsing LLM output - Implementation Plan * Implement `GenerateAnalyticsQueryService` (see [initial MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/127140 'Draft: Resolve "Experiment: natural language querying in visualization designer"') for example) * Generate discrete choices for LLM to query from our available analytics data schema and [feed that into LLM prompt](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/127140/diffs#73a7ea0cbf67659fa427b544a064b1de2c9e08a0_0_21 'Draft: Resolve "Experiment: natural language querying in visualization designer"') (i.e. 'here is a list of the available data, give me which conditions satisfy the user's question') * Without a list of data to choose from, LLM will be much more likely to hallucinate * Output from LLM can be easily validated against this list to ensure cube.js query is valid * [Extract](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/127140/diffs#72568944a3a68a924b622196ae378d5417ad2d3e_0_25 'Draft: Resolve "Experiment: natural language querying in visualization designer"') the list of attributes and filters that satisfy the user's query from the LLM and use that to build a valid cube.js query * These conditions might need to be shown to the user so they can validate them * Use this query to create a custom visualization for the user ## Additional resources - If you'd like help with technical validation, or would like to discuss UX considerations for AI mention the AI Assisted group using `@gitlab-org/modelops/applied-ml`. - Original natural language querying strategy: https://gitlab.com/gitlab-org/gitlab/-/issues/393881+ - Read about our [AI Integration strategy](https://internal-handbook.gitlab.io/handbook/product/ai-strategy/ai-integration-effort/) - Slack channels - `#wg_ai_integration` - Slack channel for the working group and the high level alignment on getting AI ready for Production (Development, Product, UX, Legal, etc.) But from the other channels fell free to reach out and post progress here - `#ai_integration_dev_lobby` - Channel for all implementation related topics and discussions of actual AI features (e.g. explain the code) - `#ai_enablement_team` - Channel for the AI Enablement Team which is building the base for all features (experimentation API, Abstraction Layer, Embeddings, etc.) <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION --> *This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.* <!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
epic