Opt-In Setting for AI Interactions Data Collection
## Problem to Solve GitLab cannot systematically improve DAP quality without access to real-world AI interaction data. While we're adding [customer feedback capture mechanisms](https://gitlab.com/gitlab-org/gitlab/-/work_items/578583) (thumbs up/down, qualitative comments), these alone don't provide the diagnostic depth needed to reproduce issues, evaluate model performance, or build regression tests. ## Scope to Deliver Introduce an **admin-controlled opt-in toggle** in Duo setting page that converts the existing Extended Logging feature flag into a discoverable, customer-facing control that collects AI interaction data (e.g. prompt and response text, session context, etc.) **Setting specifications:** * **Location:** Top-level namespace/instance admin settings * **Control level:** Instance administrators or top-level group owners * **Default state:** **Disabled** (requires explicit opt-in) * **Target audience:** **Paid tier customers only** (all DAP customers at GA are paid) * **Privacy protection:** User identifiers (user_ids, usernames) are **not stored** with AI interaction data See full proposal [here](https://docs.google.com/document/d/1gH4jj5wgpp1NgT_QzZ1qZjq3855tvWHgKpM8W2CSdeI/edit?tab=t.0). ## Legal & Compliance Validation **Legal confirmed** ([Legal Issue #3106](https://gitlab.com/gitlab-com/legal-and-compliance/-/issues/3106)): Admin-level control is sufficient—no per-user opt-in required. No further legal approval required for paid tier (this scope). ## **Out of scope for this iteration** * Free tier customers (deferred until DAP free tier launch, requires separate legal/comms strategy as we introduce new language to the Terms and Agreement) ## Design ![image.png](/uploads/f4335bd537d0ce96e72beecb0fb6d590/image.png){width="700" height="350"} ## Opt-In Rate Projection & Its Implication to Data Storage and Volume Based on industry benchmark, we estimate about 15%-25% of paid customers will opt-in as they onboard DAP. We aim to increase opt-in rate via: * leading with value - partner with field to illustrate value prop - post 18.9 delivery * integrating into DAP admin onboarding flow - to be planned * providing opt-in incentives - to be planned Depending on the data volume, it is possible that traces may be large and could increase storage costs, operational load and retrieval/analysis complexity. We will mitigate such via retention limits, sampling and caps. We will handle this with Langsmith. cc: @bastirehm @abacon-gitlab @ashrafkhamis
issue