Feature negotiation, OTLP export client, and first job_execution span
Summary
Implement feature negotiation, the OTLP export client, and emit the first job_execution span as proof of end-to-end integration. This is Phase 1 of Runner instrumentation.
Goals
- Read the
job_telemetryfeature flag andtrace_contextfrom the job payload - Configure an OTLP exporter via LabKit targeting the OTEL Collector endpoint
- Emit a single
job_executionspan that covers the full job lifecycle - Validate the end-to-end pipeline: Runner → OTEL Collector → ClickHouse → Grafana
Requirements
Feature Negotiation
- Read
features.job_telemetryfrom the job payload response — Runner already supportsGitlabFeatures, so this is adding a new feature key - Only initialize telemetry when
job_telemetryistrue - Read
trace_context.trace_idandtrace_context.parent_span_idfrom the job payload - Gracefully handle missing fields (assume telemetry disabled)
OTLP Export Client
- Use LabKit for OTEL SDK integration — aligns Runner's telemetry with the rest of the GitLab platform
- OTEL Collector endpoint is a static runner manager configuration (e.g.,
GITLAB_TRACINGconnection string or dedicated config field) - Authenticate using OIDC/workload identity token on GitLab.com hosted runners
- Handle connection failures gracefully — telemetry failures must never fail jobs
- Buffer spans locally with retry; drop after timeout
- Respect backpressure from the OTEL Collector
First job_execution Span
- Create a single span named
job_executionthat wraps the entire job lifecycle - Set
trace_idandparent_span_idfrom the job payload'strace_contextobject - Include standard span attributes:
-
ci.job.id,ci.pipeline.id,ci.project.id -
ci.runner.id,ci.runner.executor -
ci.job.status(success/failed)
-
- Set span start/end times to the actual job execution boundaries
- This span becomes the parent for all subsequent instrumentation (build stages in Phase 2, CI Functions in Phase 3)
Job Payload Contract
When features.job_telemetry is true, the job response includes:
{
"features": {
"job_telemetry": true
},
"trace_context": {
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"parent_span_id": "00f067aa0ba902b7"
}
}
See Job payload changes for full spec.
Success Criteria
- A
job_executionspan appears in theotel_tracesClickHouse table with the correcttrace_id - The span is visible in Grafana trace visualization
- Job execution is unaffected when the OTEL Collector is unreachable
- Jobs without
features.job_telemetryproduce no telemetry
Architecture Reference
Related
- Rails-side feature negotiation: gitlab#590588
- Rails-side trace context: gitlab#590587
- Built-in build stage instrumentation (Phase 2): #39230
- CI Functions instrumentation (Phase 3): #39271
- Parent epic: &20633
Edited by Pedro Pombeiro