DAP Flow Session Traceability & Audit Trail for Enterprise Compliance

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

DAP Flow Session Traceability & Audit Trail for Enterprise Compliance

Problem Statement

Enterprise customers in regulated industries (financial services, healthcare, government) cannot demonstrate governance over AI-assisted changes because there is no comprehensive audit trail linking Duo Agent Platform (DAP) flow executions to the GitLab objects they create or modify.

While some audit infrastructure exists, significant gaps prevent regulated enterprises from meeting SOC2, ISO27001, and FedRAMP compliance requirements for AI-assisted development workflows.

Current State

What Exists Today

Based on documentation and existing issues research:

1. Internal Event Instrumentation

The DAP Instrumentation Guide documents token usage tracking via track_internal_event with AI Context fields:

  • session_id, flow_type, agent_name
  • input_tokens, output_tokens
  • model_provider, model_engine
  • correlation_id, billing_event_id

2. API-Level Audit Events

The Software Development Flow documentation states:

"An audit event is created for each API request done by the Software Development Flow"

These events use the api_request_access_with_scope audit event type (introduced 17.7, Issue #499461 (closed)).

3. Commit Attribution (Resolved)

Issue #557042 (closed) addressed Composite Identity token attribution - resolved in milestone 18.7. Related Epic #20119 "Properly attribute authorship to the Composite Identity SA for DAP".

4. GraphQL Usage Data

The Duo and SDLC Trends API provides:

  • AGENT_PLATFORM_SESSION_STARTED events
  • Feature-level aggregation via AiUserMetrics
  • Requires ClickHouse (instance-level data experimental in 18.7)

5. Limited AI Framework Audit Events

Audit Event Types documentation shows only 2 AI-specific events:

  • duo_features_enabled_updated (16.10) - Settings changes
  • api_request_access_with_scope (17.7) - API requests with audited scopes

What Is Missing

1. Flow Session History UI

Gap: No user-facing interface to review past flow sessions

  • Users cannot see what flows they've executed
  • No way to review what a specific session did
  • Cannot correlate outcomes to inputs

2. Session-to-GitLab-Object Linkage

Gap: No metadata connecting session_id to created/modified resources

  • Commits created by DAP don't reference the originating flow session
  • MRs don't link back to the session that created them
  • Comments/notes have no session attribution

3. Flow Lifecycle Audit Events

Gap: Missing critical audit event types:

  • flow_started - When user initiates a flow
  • flow_completed - Successful completion with outcome summary
  • flow_failed - Failure with error context
  • flow_paused / flow_resumed - User intervention events

4. Agent Reasoning/Decision Logs

Gap: No capture of agent decision-making process

  • What plan did the agent create?
  • Why did it choose specific approaches?
  • What alternatives were considered and rejected?

5. Admin Query Capabilities

Gap: No dedicated DAP activity filtering

  • Cannot filter audit logs by flow type
  • No organization-wide DAP usage visibility
  • No compliance reporting dashboards

6. Compliance Integration

Gap: No pre-built compliance tooling

  • No SOC2/ISO27001/FedRAMP report templates
  • No SIEM integration patterns documented
  • No evidence collection automation

Customer Impact

Identified Blockers

Customer ARR Impact Compliance Requirement Status
Large European bank $3.5M Governance requirements for AI-assisted changes Blocking adoption
Major UK financial institution TBD Compliance team audit trail requirement Cannot approve without audit
Regulated Industries (General) Significant SOC2/ISO27001 change traceability GA blocker

Field Feedback

From Solutions Architecture field perspective:

  • "Flow session traceability missing — No linkage: where invoked → what GitLab changes made → audit trail"
  • "Customers in regulated industries cannot adopt DAP without demonstrating AI governance"
  • "Compliance teams asking questions we cannot answer about AI-assisted changes"

Compliance Framework Requirements

SOC2 (CC6.1, CC7.2):

  • All system changes must be authorized and traceable
  • Audit logs must capture who, what, when for changes
  • AI-assisted changes require same rigor as manual changes
  • Authorization must be verified at flow initiation (CC6.1)

ISO 27001 (A.12.4):

  • Event logging of user activities
  • Protection of log information
  • Administrator and operator logs

FedRAMP (AU-2, AU-3):

  • Auditable events defined and documented
  • Content of audit records (user, event, outcome, timestamp)
  • Sufficient detail for forensic analysis
  • Unambiguous user attribution required (AU-3)

User Stories

Compliance Officer

As a compliance officer, I need to audit all AI-assisted code changes to demonstrate governance for regulatory requirements. I need to answer: "What AI agent made this change, why, and who authorized it?"

Developer

As a developer, I want to see what changes a past flow session made so I can understand and review the agent's work. I need to correlate the output to my input to improve my prompts.

Instance Administrator

As an admin, I need to query all DAP activity across my instance for security and compliance reporting. I need to identify anomalous AI usage patterns and generate reports for auditors.

Security Team Member

As a security team member, I need to investigate what a specific flow session did if issues arise. I need a complete timeline of the agent's actions and decisions.

Engineering Manager

As an engineering manager, I need visibility into how my team uses DAP to understand productivity gains and ensure appropriate usage patterns.

Proposed Solution

1. Flow Session Entity Model

Create a first-class FlowSession entity capturing:

# Proposed schema
class FlowSession
  # Identity
  session_id: UUID
  user_id: Integer
  project_id: Integer
  
  # Context
  flow_type: String # "software_development", "fix_pipeline", etc.
  input_prompt: Text # Must be sanitized for PII/secrets before storage
  input_context: JSONB # files, issues, MRs referenced
  
  # Execution
  started_at: DateTime
  completed_at: DateTime
  status: Enum # started, running, paused, completed, failed
  
  # Authorization (CC6.1 compliance)
  authorization_verified_at: DateTime
  authorization_method: String
  authorized_scopes: Array[String]
  
  # Outcomes
  plan_summary: Text # Agent's generated plan
  decisions_log: JSONB # Reasoning steps - must be sanitized for PII/secrets
  
  # Relationships
  has_many :flow_session_resources # Links to commits, MRs, comments
end

class FlowSessionResource
  flow_session_id: UUID
  resource_type: String # "Commit", "MergeRequest", "Note"
  resource_id: String # SHA or ID
  action: String # "created", "modified"
  created_at: DateTime
end

2. Flow Lifecycle Audit Events

Add new audit event types:

Event Type Trigger Data Captured
duo_flow_started User initiates flow session_id, flow_type, user, project, input_summary, authorization_verified_at, authorization_method, authorized_scopes
duo_flow_plan_generated Agent creates plan session_id, plan_steps, estimated_actions
duo_flow_paused User pauses flow session_id, paused_at, paused_at_step
duo_flow_resumed User resumes flow session_id, resumed_at
duo_flow_completed Flow finishes successfully session_id, resources_created, duration
duo_flow_failed Flow encounters error session_id, error_category, error_code, error_message, failed_at_step, resources_created_before_failure
duo_flow_resource_created Agent creates GitLab resource session_id, resource_type, resource_id

3. Session Linkage to GitLab Objects

Attach session metadata to created resources:

Commits:

X-GitLab-Duo-Session: <session_id>

Stored in commit metadata or notes (per Git trailer convention).

Merge Requests:

  • System note: "Created by Duo Agent Platform flow session <session_id>"
  • MR description footer with session link

Comments/Notes:

  • Store duo_flow_session_id in notes table

4. User Session History UI

New page: /-/duo/sessions (or within Activity)

Features:

  • List of user's flow sessions with status
  • Filter by flow type, date range, project
  • Filter by user (for admin/compliance use - supports FedRAMP AU-3)
  • Session detail view showing:
    • Input prompt and context
    • Plan generated
    • Resources created with links
    • Timeline of agent actions
    • Duration and token usage
  • Clear linkage from session to authenticated GitLab user (Composite Identity attribution chain documented)

5. Admin DAP Audit Dashboard

New admin area: /admin/duo_agent_platform/audit

Features:

  • Organization-wide flow session listing
  • Filter by user, project, flow type, date range
  • Export capabilities for compliance reports
  • Anomaly detection (unusual usage patterns)
  • Integration with existing audit log infrastructure

6. API for Compliance Tooling

GraphQL and REST endpoints:

query {
  duoFlowSessions(
    projectId: "gid://gitlab/Project/123"
    userId: "gid://gitlab/User/456"  # FedRAMP user attribution support
    startDate: "2025-01-01"
    endDate: "2025-01-31"
  ) {
    nodes {
      sessionId
      user { username }
      flowType
      status
      startedAt
      completedAt
      authorizationVerifiedAt
      authorizationMethod
      resourcesCreated {
        resourceType
        resourceId
        webUrl
      }
      planSummary
      decisionsLog
    }
  }
}

7. Sensitive Data Handling

Critical requirement: input_prompt and decisions_log must be sanitized before storage to prevent exposure of:

  • Personally Identifiable Information (PII)
  • Secrets, tokens, or credentials
  • Other sensitive data that may be present in user prompts or agent reasoning

Acceptance Criteria

MVP (P1)

Core principle: Who initiated | What was requested | What was created | When

  • Flow sessions logged with: user, timestamp, flow_type, project, input_summary
  • duo_flow_started audit event with authorization context (authorization_verified_at, authorization_method, authorized_scopes)
  • duo_flow_completed audit event with resources_created, duration
  • duo_flow_failed audit event with error_category, error_code, error_message, failed_at_step, resources_created_before_failure
  • duo_flow_paused and duo_flow_resumed audit events
  • Session ID attached to commits/MRs created by DAP (visible in UI)
  • Basic session list available to users
  • Default retention period of 1 year for session data
  • Unambiguous user attribution - user_id in duo_flow_started linked to authenticated GitLab user
  • Sensitive data sanitization - input_prompt and decision_log sanitized for PII/secrets before storage
  • Documentation for compliance teams on available audit data
  • Documentation on Composite Identity attribution chain (for FedRAMP)

Enhanced (P2)

  • Full agent plan and reasoning captured per session
  • User session history UI with detail view
  • Admin audit dashboard with filtering
  • GraphQL API for session queries (including user filtering)
  • Audit event streaming support for DAP events

Enterprise (P3)

  • SIEM integration patterns documented
  • Pre-built compliance report templates (SOC2, ISO27001)
  • Session comparison and diff views
  • Alerting on unusual DAP usage patterns
  • Configurable retention policy (beyond 1-year default)

Technical Considerations

Architecture Ownership

Component Team Implementation
FlowSession model Agent Foundations Rails
Audit events Compliance (Govern) Rails
Session UI AI Framework Vue.js
API endpoints Agent Foundations GraphQL/REST
AI Gateway logging AI Framework AI Gateway service

Data Storage

  • Session metadata: PostgreSQL (GitLab database)
  • Agent reasoning/plans: May require dedicated storage due to size
  • Analytics/aggregation: ClickHouse (existing AI metrics pattern)
  • Long-term retention: Consider archive strategy

Performance Considerations

  • Session creation should not impact flow latency
  • Audit events should be asynchronous (Sidekiq)
  • Admin queries may need pagination and caching
  • Consider GDPR implications for session data retention

Self-Managed Requirements

  • All audit data must be stored locally
  • No external dependencies for compliance features
  • Admin configuration for retention periods
  • Export formats compatible with common SIEM tools
  • Issue #557042 (closed) - Commit attribution (CLOSED, 18.7)
  • Epic #20119 - Composite Identity authorship attribution
  • Issue #553573 - Duo as team member vision (discusses tracking)
  • Issue #549846 - Usage billing reporting (CLOSED)
  • Issue #499461 (closed) - API request audit events (referenced in docs)
  • Issue #431738 - Bot token audit concerns (CLOSED)

Documentation References

Competitive Context

GitHub Copilot currently lacks comprehensive audit trail capabilities for AI-assisted changes, representing an opportunity for GitLab differentiation in regulated enterprise markets. Customer feedback indicates audit capabilities are a key selection criterion for financial services and healthcare organizations evaluating AI coding assistants.

Edited by 🤖 GitLab Bot 🤖