Feature negotiation, OTLP export client, and first job_execution span

Summary

Implement feature negotiation, the OTLP export client, and emit the first job_execution span as proof of end-to-end integration. This is Phase 1 of Runner instrumentation.

Goals

  1. Read the job_telemetry feature flag and trace_context from the job payload
  2. Configure an OTLP exporter via LabKit targeting the OTEL Collector endpoint
  3. Emit a single job_execution span that covers the full job lifecycle
  4. Validate the end-to-end pipeline: Runner → OTEL Collector → ClickHouse → Grafana

Requirements

Feature Negotiation

  1. Read features.job_telemetry from the job payload response — Runner already supports GitlabFeatures, so this is adding a new feature key
  2. Only initialize telemetry when job_telemetry is true
  3. Read trace_context.trace_id and trace_context.parent_span_id from the job payload
  4. Gracefully handle missing fields (assume telemetry disabled)

OTLP Export Client

  1. Use LabKit for OTEL SDK integration — aligns Runner's telemetry with the rest of the GitLab platform
  2. OTEL Collector endpoint is a static runner manager configuration (e.g., GITLAB_TRACING connection string or dedicated config field)
  3. Authenticate using OIDC/workload identity token on GitLab.com hosted runners
  4. Handle connection failures gracefully — telemetry failures must never fail jobs
  5. Buffer spans locally with retry; drop after timeout
  6. Respect backpressure from the OTEL Collector

First job_execution Span

  1. Create a single span named job_execution that wraps the entire job lifecycle
  2. Set trace_id and parent_span_id from the job payload's trace_context object
  3. Include standard span attributes:
    • ci.job.id, ci.pipeline.id, ci.project.id
    • ci.runner.id, ci.runner.executor
    • ci.job.status (success/failed)
  4. Set span start/end times to the actual job execution boundaries
  5. This span becomes the parent for all subsequent instrumentation (build stages in Phase 2, CI Functions in Phase 3)

Job Payload Contract

When features.job_telemetry is true, the job response includes:

{
  "features": {
    "job_telemetry": true
  },
  "trace_context": {
    "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
    "parent_span_id": "00f067aa0ba902b7"
  }
}

See Job payload changes for full spec.

Success Criteria

  • A job_execution span appears in the otel_traces ClickHouse table with the correct trace_id
  • The span is visible in Grafana trace visualization
  • Job execution is unaffected when the OTEL Collector is unreachable
  • Jobs without features.job_telemetry produce no telemetry

Architecture Reference

  • Job payload changes
  • GitLab Runner changes

Related

  • Rails-side feature negotiation: gitlab#590588
  • Rails-side trace context: gitlab#590587
  • Built-in build stage instrumentation (Phase 2): #39230
  • CI Functions instrumentation (Phase 3): #39271
  • Parent epic: &20633
Edited Feb 19, 2026 by Pedro Pombeiro
Assignee Loading
Time tracking Loading