CI/CD Observability: Tracing with OpenTelemetry

added Category:Tracing GitLab Free dev-evangelism direction documentation grouppipeline execution sectionops typefeature labels

Setting label(s) devopsverify based on grouppipeline execution.

added devopsverify label

thanks, @dnsmichi - this is an interesting proposal!

@kbychu @abellucci - I would love to hear how this fits in with your perspective

Thanks for the ping @jreporter

I am a big fan of this initiative. Thanks @dnsmichi

I also really like what Jenkins did

I would propose the following iterations.

Measure and visualize GitLab's own pipeline via OpenTelemetry
Make pipeline tracing a capability for Self-hosted customers. The customer can determine where to send the trace or to use a OTel collector via admin configurations
Make pipeline tracing available at the group or project level

@kbychu - is this mostly in ~"group::monitor"?

@jreporter I think it could make sense in either stage, but I am not opposed to marking this Monitor with the understanding that we're unlikely to get to this right away.

I'll update the labels.

cc @abellucci

Got it thanks @kbychu

I started an issue proposing a pipeline efficiency group that could own this area.

changed milestone to %Backlog

added [deprecated] Accepting merge requests label

mentioned in issue gitlab-org/quality/triage-reports#4350 (closed)

added Category:Continuous Integration label

Adding triage

added devopsmonitor grouprespond labels and removed devopsverify grouppipeline execution labels

@dnsmichi Curious why you marked this GitLab Free?

My initial thought is assuming there is a need to understand what's going on in your pipeline probably means there are multiple people working on a sufficiently complicated project or projects. That makes it a GitLab Premium IMO.

@kbychu Agreed on the teams point. My initial idea around adding OTel metrics follows what we provide with Prometheus and Grafana for monitoring - metrics and traces should be in GitLab Free, for wider adoption and benefits for all users. The Datadog integration works the same way.

The UI/UX to further create dashboards for teams built-in - e.g. a trace viewer / navigator on top of Jaeger Tracing can be in GitLab Premium. This follows the same ideas as with pipeline efficiency and debugging in &5022 TL;DR - the proposal only covers "send backend metrics to endpoint" without any frontend (except for the config forms), which is why I thought of GitLab Free.

My reason for the label of ~"group::verify" - the metrics to collect timing points for duration values follow a similar implementation path like traces. Maybe this can be implemented by the same engineering teams :) The second step will be to add OTel as a sending target, making it ~"group::monitor" after the initial MVC and dogfooding it into the product.

Keeping CI/CD tracing in GitLab Free will also help drive the use case story for OTel moving from incubation to graduation in CNCF later. Same story as with Falco in Package Hunter recently (https://gitlab.com/gitlab-com/marketing/corporate_marketing/corporate-marketing/-/issues/5183).

If GitLab Premium makes more sense, and we can prioritize it for customer demands, and later discuss to move to GitLab Free, that's also an option :)

During my PTO, a question on Twitter was raised of tools providing CI/CD insights with OTel: https://twitter.com/__steele/status/1429681895533465604 which proves the demand for it, and shows how others do it already.

cc @jreporter

@joe-shaw FYI, could be related to your APM efforts.

Awesome, thanks for writing out your thoughts @dnsmichi

changed the description

Kubernetes API Server Tracing in alpha in v1.22 blog post: https://kubernetes.io/blog/2021/09/03/api-server-tracing/

More examples and ideas:

https://github.com/rakyll/go-test-trace to trace Go Unit tests sent to OTel.

CLI to send OpenTelemetry traces: https://github.com/equinix-labs/otel-cli similar to buildevents

Slack Engineering shared an insightful post how they do it with Jenkins, to make CI/CD more observable: https://slack.engineering/infrastructure-observability-for-changing-the-spend-curve/

cc @kbychu @jreporter

Thanks @dnsmichi

cc @abellucci

Interesting as a backend for storing metrics & traces - Promscale via https://twitter.com/ramonguiu/status/1447552169457115142 Not directly related to this feature request but taking notes for future ideas in this area, e.g. for storing our own metrics and traces we collect from CI/CD workflows.

Related idea: CNCF started https://cloudevents.io/, i.e. defining a point in time for an event, which also marks a span start/end for example.

changed the description

Improve the experience around debugging jobs and analysing pipeline: &5022

CI/CD Observability: Tracing with OpenTelemetry

Release notes

Problem to solve

Add Pipeline Efficiency docs

Data to collect

Tools for CI/CD Observability?

New framework: OpenTelemetry

Tracing specification

User demand examples

Proposal

Additional resources

Proposed Steps

Preparations: Dev Environment

OTel config in CI/CD variables

GitLab Runner - OpenTelemetry

GitLab Server - OpenTelemetry

Connect Server with Runner - Trace ID

Context metadata enrichment

User defined tracing

Security

Scaling

Additional thoughts to consider

CI/CD Observability does not stop when deployment done

Agent for Kubernetes

Limitations and Scope

Use Cases

CI/CD Observability dashboard

Better insights for support and professional services teams

Dogfooding on GitLab.com

Opstrace Tracing

Integrate with Datadog

Example Implementations

Intended users

User experience goal

Further details

Permissions and Security

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Is this a cross-stage feature?

Implementation Scope

Relates to

Activity