Skip to content

Production Jaeger tracing instance

Now that GitLab is well instrumented with distributed tracing, we should come up with a plan for rolling it out in production.

Proposal

  • Consider adding Jaeger agent as a "sidecar" for each instrumented GitLab process
  • Enable easy configuration of Jaeger via Omnibus (when enabled, Jaeger Agent will run on the host, when disabled, Jaeger agent is turned off): gitlab-org/omnibus-gitlab#4104 (closed)
    • Configuration of the GITLAB_TRACING env var for GitLab processes
    • Configuration of a Jaeger service endpoint for the agent to forward traces on to
    • Lets keep standing up the Jaeger service itself out of scope for now (@joshlambert do you agree?)
  • Use the GKE and the Jaeger operator: https://github.com/jaegertracing/jaeger-operator
  • Use ElasticCloud as the backend
  • Setup a chef recipe for Jaeger agent
  • Roll out Jaeger agent to the web/api/git/gitaly fleet in staging
  • Start with a very low sample rate, with head sampling and enabling tracing on a subset of staging
  • Monitor, test, validate, ramp up
  • Do the same in canary
  • Do the same in production

Roll out plan

  • deploy Jaeger to production
  • enable tracing in canary
  • measure performance impact on the application from enabling tracing
  • measure Jaeger performance
  • gradually roll out tracing to the rest of the fleet

cc @joshlambert

Edited by Michal Wasilewski