Skip to content

Draft: Tracing: Have a deeper look into Git process with trace2

Quang-Minh Nguyen requested to merge qmnguyen0711/implement-trace2-tracing into master

For #4816 (closed)

Note: This MR is blocked by Tracing: Add span's sampling status check to la... (#4805 - closed)

Gitaly offloads a lot of complicated operations to git processes. When something goes wrong, it's quite hard to debug deeper into Git processes. In local environment, it's easy to plug some performance tools to measure how git process performs. It's not the case for the production environment. We can use some exposed logs, metrics, or similar statistical data. I wish we had a full descriptive picture, step by step, of an action. Distributed tracing may solve this problem. However, it has major problem: Most spans stop at git-spawning steps.

Git bundles a tool called Trace2. This tool enables Git processes to print underlying significant events and corresponding measurements. It collects the activities of child processes as well. Obviously, it adds some overhead to git processes and maybe some risks when passing the events in a fd. It may not very wise to enable Trace2 for all Git processes. However, it's a perfect fits for distributed tracing. We are sampling data at a very low rate. The exported events are also convenient to transform into Tracing spans.

So, the initial idea is to enable Trace2 for sampled trace only (blocked by Tracing: Add span's sampling status check to la... (#4805 - closed)). Apart from being used by distributed tracing, trace2 is also useful to export some underlying metrics, such as pack-objects (#4699 (closed)). So, it makes sense to convert the events to a tree data structure. This tree provides a walking method to implement hooks that serve data exporting.

I also opened another MR to link traces of gitaly-hooks (tracing: Fix disconnected spans in gitaly-hooks (!5419 - merged)). Afterward, we'll have full visibility of git hooks (pre-preceive, post-preceive, update-ref, etc.)

Before

Screenshot_2023-02-27_at_17.08.10

The trace stops at git-something command. We have no idea what's going on.

After

Screenshot_2023-02-27_at_17.09.16

Screenshot_2023-02-27_at_17.09.29

Screenshot_2023-02-27_at_17.09.46

Screenshot_2023-02-27_at_17.10.03

Edited by Quang-Minh Nguyen

Merge request reports