Skip to content

Tracing with Jaeger

Description

Every application needs to monitor performance. We should make that part of Auto DevOps

Proposal

Architecture:

  1. APM agent: Datadog https://docs.datadoghq.com/libraries/#apm-tracing-client-libraries (open source, uses opentracing https://www.datadoghq.com/blog/opentracing-datadog-cncf/)
  2. Collector: Jaeger collector https://github.com/jaegertracing/jaeger/tree/master/cmd/collector (we'll be able to reuse that when we start to do tracing https://gitlab.com/gitlab-org/gitlab-ee/issues/3458 )
  3. Storage: Elastic Search (Jaeger only supports Cassandra and ES, we already use ES for search)
  4. Backend: GitLab Rails (already interfaces with ES for search, we want one application)
  5. UI: GitLab Vue (show the graphs in combination as the metrics of Prometheus)

/cc @markpundsack @joshlambert @bjk-gitlab

Notes

Need a collector, query service

Zipkin or Jaeger

Jaeger popular, Uber employees only? Also CNCF only Jaeger

Would need to change wire protocol for Jaeger Opentracing to Datadog

Nice if we want to use Jaeger for non-APM tracer later

Uber had a problem with muliple languages

https://eng.uber.com/wp-content/uploads/2017/02/4-5-EngBlog-Distributed-Tracing-at-Uber-768x432.png looks like how I would make it, Zipkin for UI and query

Zipkin uses thrift?

Even though the Zipkin backend was fairly well known and popular, it lacked a good story on the instrumentation side, especially outside of the Java/Scala ecosystem. All our client libraries have been build to support the OpenTracing API from inception.

Jaeger does sampling via the backend, Zipkin on initialization

We probably can't use Zipkin's UI since it is not made for APM and we want something in GitLab.

Zipkin has one prolific author (Yuri, who works at Uber!), Jaeger has more, they are complementary!

Jaeger has well defined client libraries

Zipkin is not a CNCF project

You use it with Prometheus for metrics and Jaeger for tracing https://www.youtube.com/watch?time_continue=6&v=fjYAU3jayVo at 41:00 plus https://github.com/objectiser/opentracing-prometheus-example

also https://github.com/jaegertracing/jaeger/commit/9988a3d3d0be3d832c1151f17f5d6eac59a0c052 to support Prometheus metrics as default for all components

standalone https://github.com/jaegertracing/jaeger/blob/master/cmd/standalone/standalone_test.go is the part we need?

jaegertracing/all-in-one The container runs the Jaeger backend with an in-memory store

There is also https://github.com/lightstep/lightstep-tracer-go (only one author) and https://github.com/sourcegraph/appdash (not active)

Logs are sampled but included. Still need full logs for compliance and diagnosing an error of a single user, but not for finding out what is happening to most users.

Conventions are in https://github.com/opentracing/specification/blob/master/semantic_conventions.md

Do we add APM to jaeger UI? No, we want everything in GitLab.

https://medium.com/opentracing/take-opentracing-for-a-hotrod-ride-f6e3141f7941 The most common measurements for RPC are request counts, error counts, and distribution of request latencies. OpenTracing instrumentation already captures all these signals and can emit metrics without additional instrumentation.

https://github.com/jaegertracing no ruby client

Jaeger cares about accepting Zipkin https://github.com/jaegertracing/xdock-zipkin-brave

Collector is https://github.com/jaegertracing/jaeger/tree/master/cmd/collector

One problem is that the collector doesn't allow queries.

Collector normally sends those to cassendra, lets use the same? The also have memory and Elasticsearch support. MySQL coming soon. Lets use Elasticsearch, since we use that for advanced search, will use it for logs.

You can measure Envoy with Jaeger https://www.envoyproxy.io/docs/envoy/latest/install/sandboxes/jaeger_tracing

Need to wait for MySQL support for Jaeger and then add PostgreSQL support. https://github.com/jaegertracing/jaeger/tree/master/plugin/storage doesn't have mysql http://jaeger.readthedocs.io/en/latest/deployment/#storage-backend says "There is ongoing work to add support for MySQL and ScyllaDB." but main author wants to remove that https://github.com/jaegertracing/jaeger/pull/473/commits/d69b82dc8b19099c2044448512f2c0708efeb5cc#r144877703 and is considering plugins as an alternative https://github.com/jaegertracing/jaeger/issues/422

Zipkin Jaeger drama https://github.com/jaegertracing/jaeger/issues/272#issuecomment-323922392 and https://github.com/jaegertracing/jaeger/issues/140#issuecomment-299073230

Ticketmaster is expirimenting https://github.com/jaegertracing/jaeger/issues/207#issuecomment-344788188

Edited by Joshua Lambert