Experiment: natural language querying in visualization designer

⚠ Promoted to the following epic Customer MVC: natural language querying in visu... (&12245)

Experiment

This section should be completed prior to work on the Experiment beginning.

Experiment

Problem to be solved

User problem

Creating the SQL needed to create a visualization is difficult so I don't create custom visualizations or dashboards for Product Analytics.

Solution hypothesis

If GitLab provides an interface for users (Product/Engineering Managers) to ask questions about Product Analytics data, then they will create more custom visualizations and same them to custom dashboards.

Assumption

What assumptions are you making about this problem and the solution?

We assume customers have instrumented an app but are not getting value from the default dashboards.
We assume customers have specific questions they can ask about, like "How many visitors did we have this week compared to last week"

Personas

What personas have this problem, who is the intended user?

Engineering Manager
Product Manager

Proposal

Product Analytics-specific implementation of natural language querying. May be less useful in the visualization designer but the interface might be viable in the context of data exploration.

Success

How will you measure whether this experiment is a success?

We can measure how many custom visualizations are created after launch of the experiment compared to before / onboarded project.

Possible Solutions

Using langchain (json agent) for an implementation like https://www.vizgpt.ai/
Utilizing a chat interface like how Cube does with Delphi and Slack

Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/393881+

Feature release

Main Job story

What job to be done will this solve?

Proposal updates/additions

Problem validation

What validation exists that customers have this problem?

Business objective

What business objective will be achieved with this proposal?

Confidence

Has this proposal been derived from research?

Confidence	Research
[High/Medium/Low]	research/insight issue

Requirements

What tasks or actions should the user be capable of performing with this feature?

⚠️ Related feature and research issues should be linked in the related issues section (Delete this line when this is done)

The user needs to be able to:

Checklist

Experiment

Issue information

Feature release

Issue information

Add information to the issue body about:
- Your proposal
- The Job Statement it's expected to satisfy
- Details about the user problem and provide any research or problem validation
- List the personas impacted by the proposal.
Add all relevant solution validation issues to the Linked items section that shows this proposal will solve the customer problem, or details explaining why it's not possible to provide that validation.
Add relevant designs to the Design Management area of the issue.
You have adhered to our Definition of Done standards
Ensure this issue has the wg-ai-integration label to ensure visibility to various teams working on this

Technical needs

Please consider the operational aspects of the feature you are creating. A list of things to think about is in: https://gitlab.com/gitlab-org/gitlab/-/issues/403859. We will be improving this process in the future: !117637 (comment 1353253349).
@ mention your AppSec Stable Counterpart and read the AI secure coding guidelines

Work estimate and skills needs to build an ML viable feature: To build any ML feature depending on the work, there are many personas that contribute including, Data Scientist, NLP engineer, ML Engineer, MLOps Engineer, ML Infra engineers, and Fullstack engineer to integrate the ML Services with Gitlab. Post-prototype we would assess the skills needed to build a production-grade ML feature for the prototype.
Data Limitation: We would like to upfront validate if we have viable data for the feature including whether we can use the DataOps pipeline of ModelOps or create a custom one. We would want to understand the training data, test data, and feedback data to dial up the accuracy and the limitations of the data.
Model Limitation: We would want to understand if we can use an open-source pre-trained model, tune and customize it or start a model from scratch as well. Further, we would assess based on the ModelOps model evaluation framework which would be the right model to use based on the use case.
Cost, Scalability, Reliability: We would want to estimate the cost of hosting, serving, inference of the model, and the full end-to-end infrastructure including monitoring and observability.
Legal and Ethical Framework: We would want to align with legal and ethical framework like any other ModelOps features to cover across the nine principles of responsible ML and any legal support needed.

Dependency needs

Please consider the operational aspects of the service you are creating. A list of things to think about is in: https://gitlab.com/gitlab-org/gitlab/-/issues/403859. We will be improving this process in the future: !117637 (comment 1353253349).

Legal needs

Implementation Notes

Prompt Engineering:
- Limited success asking LLM to output cube.js queries, tends to get confused and include SQL inside measures unnecessarily
- More success with asking the LLM to give a list of measures, dimensions and filters that satisfy the users question from a given list of available data in our schema, then fit that into our cube.js query
- Anthropic does well returning XML
  - try YAML?
- Use the gitlab:llm:zero_shot:test:questions rake task for evaluating prompts
- See ChainOfThoughtParser for example of parsing LLM output
Implementation Plan
- Implement GenerateAnalyticsQueryService (see initial MR for example)
- Generate discrete choices for LLM to query from our available analytics data schema and feed that into LLM prompt (i.e. 'here is a list of the available data, give me which conditions satisfy the user's question')
  - Without a list of data to choose from, LLM will be much more likely to hallucinate
  - Output from LLM can be easily validated against this list to ensure cube.js query is valid
- Extract the list of attributes and filters that satisfy the user's query from the LLM and use that to build a valid cube.js query
  - These conditions might need to be shown to the user so they can validate them
- Use this query to create a custom visualization for the user

Additional resources

If you'd like help with technical validation, or would like to discuss UX considerations for AI mention the AI Assisted group using @gitlab-org/modelops/applied-ml.
Read about our AI Integration strategy
Slack channels
- #wg_ai_integration - Slack channel for the working group and the high level alignment on getting AI ready for Production (Development, Product, UX, Legal, etc.) But from the other channels fell free to reach out and post progress here
- #ai_integration_dev_lobby - Channel for all implementation related topics and discussions of actual AI features (e.g. explain the code)
- #ai_enablement_team - Channel for the AI Enablement Team which is building the base for all features (experimentation API, Abstraction Layer, Embeddings, etc.)

Edited Feb 20, 2024 by Dennis Tang