Distributed Tracing in LabKit
Reviewers Guide
This change adds distributed tracing to LabKit. LabKit is embedded within Gitaly and Workhorse (and could potentially be used with other Go services at GitLab too).
Design Decisions
Optional OpenTracing Support
The Opentracing API doesn't ship with the ability to switch tracing vendors. It does not support loading implementation libraries dynamically, in the way you can with JDBC, ODBC, etc.
From the application developer's point-of-view, this has several disadvantages:
- Most GitLab users do not need distributed tracing on their GitLab instance.
- Therefore, adding Jaeger (+ thrift and other downstream dependencies) to all Workhorse + Gitaly installs is not ideal.
- Other than GitLab.com, any other major GitLab instances may already have their own distributed tracing solutions (Lightstep, Zipkin, Datadog, etc) which they have invested in and have experience with. Forcing them to use the one of our choice is not ideal.
- We may use different tracing libraries in Production and Development
For these reasons, I would prefer to ship Workhorse and Gitaly without tracing by default, but allow it to be added when necessary.
My initial attempt was to use the pkg/plugin
go standard package to dynamically load the tracing library when necessary. However, it turns out that this approach is far too fragile. Go plugins would result in lost productivity and support overhead.
Instead, I've gone with conditional build tags so that it's possible to statically link one or more implementation into the binary (Jaeger, Datadog, Lightstep, Zipkin, etc).
To statically compile against a specific implementation, the following approach can be taken:
go build -tags "tracer_static tracer_static_jaeger" ./...
go build -tags "tracer_static tracer_static_datadog" ./...
go build -tags "tracer_static tracer_static_ligthstep" ./...
- etc
Running the examples
The change ships with several examples:
-
example/run-jaeger-static
- start jaeger, and run demo app with jaeger statically linked in -
example/run-datadog-static
- start dd-agent, and run demo app with datadog statically linked in -
example/run-no-tracing
- exclude opentracing altogether
In each case the same application will execute. It's a tiny app running on port 8080. It has a single endpoint:
$ curl http://localhost:8080/query?ttl=100
Hello
All the service does is call itself recursively (via HTTP) as many times as given by the ttl
parameter. To recursively call the application 10 times, use ttl=10
, etc.
After issuing some requests using curl
, it's time to take a look at the traces!
For Jaeger, open http://localhost:16686/search/ and find a trace. Hopefully you should find something like this:
For Datadog, log into https://app.datadoghq.com/apm/home, where you should get something like
OpenTracing "Connection Strings"
In order for this change to be adopted, it needs to be easy to use. For this reason, one of the early decisions I made is that all GitLab components, Ruby and Golang should share a single configuration for their opentracing config.
Unfortunately, there is no common approach to configuring an opentracing implementation.
I've elected to go with a "connection string" approach. Since the same connection string can be used to configure Gitaly (golang), Workhorse (golang), Sidekiq (ruby) and Rails (ruby), it's easy to enable when running in GDK: simple run GDK with GITLAB_TRACING=opentracing://jaeger gdk run
.
All the child processes will inherit the same GITLAB_TRACING
configuration.
Here are some examples of connection strings:
opentracing://jaeger
opentracing://jaeger?debug=true
opentracing://jaeger?sampler=const&sampler_param=1
opentracing://jaeger?sampler=probabilistic&sampler_param=0.1
opentracing://jaeger?http_endpoint=http%3A%2F%2Flocalhost%3A14268%2Fapi%2Ftraces
opentracing://jaeger?udp_endpoint=10.0.0.1:1234
opentracing://lightstep?access_token=123
opentracing://datadog
Note that in order to allow forwards compatibility, opentracing provider factories will ignore unrecognised parameters, unless the strictConnectionParsing=1
option is passed in.
File Sizes
- App with jaeger statically linked: binary: 9604k
- App with datadog statically linked: binary: 8880k
- No tracing: binary: 8124k
Downstream Merge Requests
- Workhorse gitlab-workhorse!325 (merged)
- Gitaly gitaly!976 (merged)
- GitLab-CE: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21280