Tracing: Enhance git process spans
We enabled distributed tracing on GitLab.com production, with the sample rate of 0.1%. All captured traces are sent to Stackdriver for exploration. It's available to all engineers who have access to GitLab GCP dashboard. However, the data is not that useful.
There are multiple problems with the current state:
- The spans are empty. It is incapable of describing a basic request flowing to Gitaly. We'll need to add more key spans.
- There are two many noises there, especially the URIs of spawned commands.
- The sampling is heavily skewed to more popular calls. While this is a certain result of random sampling, I think we can do better by applying a more sophisticated sampling algorithm, scoped by each RPC. This improves the situation where less common RPC doesn't have any traces.
- We don't feed logs to GCP. So, stackdriver could not pull the related logs. So, each span should essentially contain the correlation ID to query the logs from Kibana. We need docs about how to use tracing.
- Other components (Workhorse, Shell, etc.) should export the data to stackdriver as well. That makes tracing more useful, end to end.
- Some more minor improvements regarding labels and metadata.
I think this is a low-hanging fruit we can do to improve the debug experience significantly on production, apart from existing tools like Flamegraph (GCP profiler), Prometheus metrics, and logs.