Sign in or sign up before continuing. Don't have an account yet? Register now to get started.
Register now
Feature negotiation, OTLP export client, and first job_execution span
## Summary Implement feature negotiation, the OTLP export client, and emit the first `job_execution` span as proof of end-to-end integration. This is the first step of Runner instrumentation (basic instrumentation — end-to-end integration + built-in stage spans). ## Goals 1. Read the `features.tracing` object from the job payload 2. Configure an OTLP exporter via [LabKit](https://gitlab.com/gitlab-org/labkit) targeting the OTEL Collector endpoint(s) from the job payload 3. Emit a single `job_execution` span that covers the full job lifecycle 4. Validate the end-to-end pipeline: Runner → OTEL Collector → ClickHouse → Grafana ## Requirements ### Feature Negotiation 1. Read `features.tracing` from the job payload response — Runner already supports `GitlabFeatures`, so this is adding a new feature key 2. Only initialize telemetry when `features.tracing` is present and contains `otel_endpoints` 3. Read `features.tracing.trace_id` and `features.tracing.span_parent_id` from the job payload 4. Gracefully handle missing fields (assume telemetry disabled) ### OTLP Export Client 1. Use [LabKit](https://gitlab.com/gitlab-org/labkit) for OTEL SDK integration — aligns Runner's telemetry with the rest of the GitLab platform 2. **OTEL Collector endpoints come from Rails** in `features.tracing.otel_endpoints` (array of objects, max 2 entries). Each entry has a `url` and optional `auth` configuration ([endpoint auth schema](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#endpoint-auth-schema)). No static runner manager `config.toml` configuration needed. The Runner configures one OTLP exporter per endpoint and exports in parallel. The first entry is the GitLab-managed Collector (Rails app setting); an optional second entry supports BYO OTLP destinations (customer-configured at the project/group level). 3. When `auth` is present on an endpoint, authenticate accordingly. For `http_bearer` auth type, fetch a token from `token_url` (with `token_request_headers`) and attach it as `Authorization: Bearer <token>`. When `auth` is absent, export without authentication. 4. **Use `AlwaysOn` SDK sampling** — override LabKit's default `TraceIDRatioBased(0.01)` sampler with `Config{SampleRate: 1.0}`. Sampling decisions are made by Rails (not the SDK). If `features.tracing` is present, the pipeline was already sampled; the Runner instruments everything. 5. Handle connection failures gracefully — telemetry failures must never fail jobs 6. Buffer spans locally with retry; drop after timeout 7. Respect backpressure from the OTEL Collector ### First `job_execution` Span 1. Create a single span named `job_execution` that wraps the entire job lifecycle 2. Set `trace_id` from `features.tracing.trace_id`; set `parent_span_id` from `features.tracing.span_parent_id` (optional — only present for child pipeline jobs, derived from the bridge job DB ID; absent for top-level jobs where Runner creates root-level spans) 3. Include standard span attributes: - `ci.job.id`, `ci.pipeline.id`, `ci.project.id` - `ci.pipeline.source` (e.g. `push`, `merge_request_event`, `schedule`, `parent_pipeline`) - `ci.runner.id`, `ci.runner.executor` - `ci.job.status` (success/failed) 4. Set span start/end times to the actual job execution boundaries 5. This span becomes the parent for all subsequent instrumentation (build stages, CI Functions) ## Job Payload Contract When `features.tracing` is present, the job response includes: ```json { "features": { "tracing": { "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_parent_id": "00f067aa0ba902b7", "otel_endpoints": [ { "url": "https://otel-collector.gitlab.example.com:4318", "auth": { "type": "http_bearer", "http_bearer": { "token_url": "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=ci-runner-traces", "token_request_headers": { "Metadata-Flavor": "Google" } } } } ] } } } ``` - `trace_id`: Pipeline-scoped W3C trace ID (32 hex chars), generated by Rails - `span_parent_id`: Optional — present only for child pipeline jobs (hex-encoded bridge job DB ID); absent for top-level pipeline jobs - `otel_endpoints`: Array of 1–2 endpoint objects, each with `url` (OTLP/HTTP endpoint URL) and optional `auth` (typed auth configuration — see [endpoint auth schema](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#endpoint-auth-schema)) See [Job payload changes](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#job-payload-changes) for full spec. ## Success Criteria - A `job_execution` span appears in the `otel_traces` ClickHouse table with the correct `trace_id` - The span is visible in Grafana trace visualization - Job execution is unaffected when the OTEL Collector is unreachable - Jobs without `features.tracing` produce no telemetry ## Architecture Reference - [Job payload changes](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#job-payload-changes) - [Endpoint auth schema](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#endpoint-auth-schema) - [GitLab Runner changes](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#gitlab-runner-changes) - [Sampling strategy](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ci_job_telemetry/#sampling-strategy) ## Related - Rails-side feature negotiation: gitlab-org/gitlab#590588 - Rails-side trace context: gitlab-org/gitlab#590587 - OTEL Collector endpoint app setting: gitlab-org/gitlab#591941 - Sampling rate app setting: gitlab-org/gitlab#593834 - Built-in build stage instrumentation: gitlab-org/gitlab-runner#39230 - CI Functions instrumentation: gitlab-org/gitlab-runner#39271 - Parent epic: gitlab-org&20633
issue