Opt-In Setting for AI Interactions Data Collection
## Problem to Solve
GitLab cannot systematically improve DAP quality without access to real-world AI interaction data. While we're adding [customer feedback capture mechanisms](https://gitlab.com/gitlab-org/gitlab/-/work_items/578583) (thumbs up/down, qualitative comments), these alone don't provide the diagnostic depth needed to reproduce issues, evaluate model performance, or build regression tests.
## Scope to Deliver
Introduce an **admin-controlled opt-in toggle** in Duo setting page that converts the existing Extended Logging feature flag into a discoverable, customer-facing control that collects AI interaction data (e.g. prompt and response text, session context, etc.)
**Setting specifications:**
* **Location:** Top-level namespace/instance admin settings
* **Control level:** Instance administrators or top-level group owners
* **Default state:** **Disabled** (requires explicit opt-in)
* **Target audience:** **Paid tier customers only** (all DAP customers at GA are paid)
* **Privacy protection:** User identifiers (user_ids, usernames) are **not stored** with AI interaction data
See full proposal [here](https://docs.google.com/document/d/1gH4jj5wgpp1NgT_QzZ1qZjq3855tvWHgKpM8W2CSdeI/edit?tab=t.0).
## Legal & Compliance Validation
**Legal confirmed** ([Legal Issue #3106](https://gitlab.com/gitlab-com/legal-and-compliance/-/issues/3106)): Admin-level control is sufficient—no per-user opt-in required. No further legal approval required for paid tier (this scope).
## **Out of scope for this iteration**
* Free tier customers (deferred until DAP free tier launch, requires separate legal/comms strategy as we introduce new language to the Terms and Agreement)
## Design
{width="700" height="350"}
## Opt-In Rate Projection & Its Implication to Data Storage and Volume
Based on industry benchmark, we estimate about 15%-25% of paid customers will opt-in as they onboard DAP. We aim to increase opt-in rate via:
* leading with value - partner with field to illustrate value prop - post 18.9 delivery
* integrating into DAP admin onboarding flow - to be planned
* providing opt-in incentives - to be planned
Depending on the data volume, it is possible that traces may be large and could increase storage costs, operational load and retrieval/analysis complexity. We will mitigate such via retention limits, sampling and caps. We will handle this with Langsmith.
cc: @bastirehm @abacon-gitlab @ashrafkhamis
issue