Sign in or sign up before continuing. Don't have an account yet? Register now to get started.
Feature negotiation, OTLP export client, and first job_execution span
## Summary
Implement feature negotiation, the OTLP export client, and emit the first `job_execution` span as proof of end-to-end integration. This is the first step of Runner instrumentation (basic instrumentation — end-to-end integration + built-in stage spans).
## Goals
1. Read the `features.tracing` object from the job payload
2. Configure an OTLP exporter via [LabKit](https://gitlab.com/gitlab-org/labkit) targeting the OTEL Collector endpoint(s) from the job payload
3. Emit a single `job_execution` span that covers the full job lifecycle
4. Validate the end-to-end pipeline: Runner → OTEL Collector → ClickHouse → Grafana
## Requirements
### Feature Negotiation
1. Read `features.tracing` from the job payload response — Runner already supports `GitlabFeatures`, so this is adding a new feature key
2. Only initialize telemetry when `features.tracing` is present and contains `otel_endpoints`
3. Read `features.tracing.trace_id` and `features.tracing.span_parent_id` from the job payload
4. Gracefully handle missing fields (assume telemetry disabled)
### OTLP Export Client
1. Use [LabKit](https://gitlab.com/gitlab-org/labkit) for OTEL SDK integration — aligns Runner's telemetry with the rest of the GitLab platform
2. **OTEL Collector endpoints come from Rails** in `features.tracing.otel_endpoints` (array of objects, max 2 entries). Each entry has a `url` and optional `auth` configuration ([endpoint auth schema](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#endpoint-auth-schema)). No static runner manager `config.toml` configuration needed. The Runner configures one OTLP exporter per endpoint and exports in parallel. The first entry is the GitLab-managed Collector (Rails app setting); an optional second entry supports BYO OTLP destinations (customer-configured at the project/group level).
3. When `auth` is present on an endpoint, authenticate accordingly. For `http_bearer` auth type, fetch a token from `token_url` (with `token_request_headers`) and attach it as `Authorization: Bearer <token>`. When `auth` is absent, export without authentication.
4. **Use `AlwaysOn` SDK sampling** — override LabKit's default `TraceIDRatioBased(0.01)` sampler with `Config{SampleRate: 1.0}`. Sampling decisions are made by Rails (not the SDK). If `features.tracing` is present, the pipeline was already sampled; the Runner instruments everything.
5. Handle connection failures gracefully — telemetry failures must never fail jobs
6. Buffer spans locally with retry; drop after timeout
7. Respect backpressure from the OTEL Collector
### First `job_execution` Span
1. Create a single span named `job_execution` that wraps the entire job lifecycle
2. Set `trace_id` from `features.tracing.trace_id`; set `parent_span_id` from `features.tracing.span_parent_id` (optional — only present for child pipeline jobs, derived from the bridge job DB ID; absent for top-level jobs where Runner creates root-level spans)
3. Include standard span attributes:
- `ci.job.id`, `ci.pipeline.id`, `ci.project.id`
- `ci.pipeline.source` (e.g. `push`, `merge_request_event`, `schedule`, `parent_pipeline`)
- `ci.runner.id`, `ci.runner.executor`
- `ci.job.status` (success/failed)
4. Set span start/end times to the actual job execution boundaries
5. This span becomes the parent for all subsequent instrumentation (build stages, CI Functions)
## Job Payload Contract
When `features.tracing` is present, the job response includes:
```json
{
"features": {
"tracing": {
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"span_parent_id": "00f067aa0ba902b7",
"otel_endpoints": [
{
"url": "https://otel-collector.gitlab.example.com:4318",
"auth": {
"type": "http_bearer",
"http_bearer": {
"token_url": "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=ci-runner-traces",
"token_request_headers": { "Metadata-Flavor": "Google" }
}
}
}
]
}
}
}
```
- `trace_id`: Pipeline-scoped W3C trace ID (32 hex chars), generated by Rails
- `span_parent_id`: Optional — present only for child pipeline jobs (hex-encoded bridge job DB ID); absent for top-level pipeline jobs
- `otel_endpoints`: Array of 1–2 endpoint objects, each with `url` (OTLP/HTTP endpoint URL) and optional `auth` (typed auth configuration — see [endpoint auth schema](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#endpoint-auth-schema))
See [Job payload changes](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#job-payload-changes) for full spec.
## Success Criteria
- A `job_execution` span appears in the `otel_traces` ClickHouse table with the correct `trace_id`
- The span is visible in Grafana trace visualization
- Job execution is unaffected when the OTEL Collector is unreachable
- Jobs without `features.tracing` produce no telemetry
## Architecture Reference
- [Job payload changes](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#job-payload-changes)
- [Endpoint auth schema](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#endpoint-auth-schema)
- [GitLab Runner changes](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#gitlab-runner-changes)
- [Sampling strategy](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#sampling-strategy)
## Related
- Rails-side feature negotiation: gitlab-org/gitlab#590588
- Rails-side trace context: gitlab-org/gitlab#590587
- OTEL Collector endpoint app setting: gitlab-org/gitlab#591941
- Sampling rate app setting: gitlab-org/gitlab#593834
- Built-in build stage instrumentation: gitlab-org/gitlab-runner#39230
- CI Functions instrumentation: gitlab-org/gitlab-runner#39271
- Parent epic: gitlab-org&20633
issue