Skip to content

Distributed Tracing in LabKit

Andrew Newdigate requested to merge d-tracing into master

Reviewers Guide

This change adds distributed tracing to LabKit. LabKit is embedded within Gitaly and Workhorse (and could potentially be used with other Go services at GitLab too).

Design Decisions

Optional OpenTracing Support

The Opentracing API doesn't ship with the ability to switch tracing vendors. It does not support loading implementation libraries dynamically, in the way you can with JDBC, ODBC, etc.

From the application developer's point-of-view, this has several disadvantages:

  1. Most GitLab users do not need distributed tracing on their GitLab instance.
  2. Therefore, adding Jaeger (+ thrift and other downstream dependencies) to all Workhorse + Gitaly installs is not ideal.
  3. Other than GitLab.com, any other major GitLab instances may already have their own distributed tracing solutions (Lightstep, Zipkin, Datadog, etc) which they have invested in and have experience with. Forcing them to use the one of our choice is not ideal.
  4. We may use different tracing libraries in Production and Development

For these reasons, I would prefer to ship Workhorse and Gitaly without tracing by default, but allow it to be added when necessary.

My initial attempt was to use the pkg/plugin go standard package to dynamically load the tracing library when necessary. However, it turns out that this approach is far too fragile. Go plugins would result in lost productivity and support overhead.

Instead, I've gone with conditional build tags so that it's possible to statically link one or more implementation into the binary (Jaeger, Datadog, Lightstep, Zipkin, etc).

To statically compile against a specific implementation, the following approach can be taken:

  • go build -tags "tracer_static tracer_static_jaeger" ./...
  • go build -tags "tracer_static tracer_static_datadog" ./...
  • go build -tags "tracer_static tracer_static_ligthstep" ./...
  • etc

Running the examples

The change ships with several examples:

  • example/run-jaeger-static - start jaeger, and run demo app with jaeger statically linked in
  • example/run-datadog-static - start dd-agent, and run demo app with datadog statically linked in
  • example/run-no-tracing - exclude opentracing altogether

In each case the same application will execute. It's a tiny app running on port 8080. It has a single endpoint:

$ curl http://localhost:8080/query?ttl=100
Hello

All the service does is call itself recursively (via HTTP) as many times as given by the ttl parameter. To recursively call the application 10 times, use ttl=10, etc.

After issuing some requests using curl, it's time to take a look at the traces!

For Jaeger, open http://localhost:16686/search/ and find a trace. Hopefully you should find something like this:

image

For Datadog, log into https://app.datadoghq.com/apm/home, where you should get something like

image

OpenTracing "Connection Strings"

In order for this change to be adopted, it needs to be easy to use. For this reason, one of the early decisions I made is that all GitLab components, Ruby and Golang should share a single configuration for their opentracing config.

Unfortunately, there is no common approach to configuring an opentracing implementation.

I've elected to go with a "connection string" approach. Since the same connection string can be used to configure Gitaly (golang), Workhorse (golang), Sidekiq (ruby) and Rails (ruby), it's easy to enable when running in GDK: simple run GDK with GITLAB_TRACING=opentracing://jaeger gdk run.

All the child processes will inherit the same GITLAB_TRACING configuration.

Here are some examples of connection strings:

  • opentracing://jaeger
  • opentracing://jaeger?debug=true
  • opentracing://jaeger?sampler=const&sampler_param=1
  • opentracing://jaeger?sampler=probabilistic&sampler_param=0.1
  • opentracing://jaeger?http_endpoint=http%3A%2F%2Flocalhost%3A14268%2Fapi%2Ftraces
  • opentracing://jaeger?udp_endpoint=10.0.0.1:1234
  • opentracing://lightstep?access_token=123
  • opentracing://datadog

Note that in order to allow forwards compatibility, opentracing provider factories will ignore unrecognised parameters, unless the strictConnectionParsing=1 option is passed in.

File Sizes

  • App with jaeger statically linked: binary: 9604k
  • App with datadog statically linked: binary: 8880k
  • No tracing: binary: 8124k

Downstream Merge Requests

Edited by Andrew Newdigate

Merge request reports