Skip to content

Tracing: Have a deeper look into Git process with trace2

Quang-Minh Nguyen requested to merge qmnguyen0711/implement-trace2-tracing into master

For #4816 (closed)

Gitaly offloads a lot of complicated operations to git processes. When something goes wrong, it's quite hard to debug deeper into Git processes. In local environment, it's easy to plug some performance tools to measure how git process performs. It's not the case for the production environment. We can use some exposed logs, metrics, or similar statistical data. I wish we had a full descriptive picture, step by step, of an action. Distributed tracing may solve this problem. However, it has major problem: Most spans stop at git-spawning steps.

Git bundles a tool called Trace2. This tool enables Git processes to print underlying significant events and corresponding measurements. It collects the activities of child processes as well. Obviously, it adds some overhead to git processes and maybe some risks when passing the events in a fd. It may not very wise to enable Trace2 for all Git processes. However, it's a perfect fits for distributed tracing. We are sampling data at a very low rate. The exported events are also convenient to transform into Tracing spans.

So, the initial idea is to enable Trace2 for sampled trace only (blocked by Tracing: Add span's sampling status check to la... (#4805 - closed)). Apart from being used by distributed tracing, trace2 is also useful to export some underlying metrics, such as pack-objects (#4699 (closed)). So, it makes sense to convert the events to a tree data structure. This tree provides a walking method to implement hooks that serve data exporting.

I also opened another MR to link traces of gitaly-hooks (tracing: Fix disconnected spans in gitaly-hooks (!5419 - merged)). Afterward, we'll have full visibility of git hooks (pre-preceive, post-preceive, update-ref, etc.)

Before

Screenshot_2023-02-27_at_17.08.10

The trace stops at git-something command. We have no idea what's going on.

After

Screenshot_2023-02-27_at_17.09.16

Screenshot_2023-02-27_at_17.09.29

Screenshot_2023-02-27_at_17.09.46

Screenshot_2023-02-27_at_17.10.03

Trace2 hooking is also added to Gitaly gRPC logs:

Screenshot_2023-03-02_at_15.17.07

Note

@igorwwwwwwwwwwwwwwwwwwww mentioned after Implement tracing sampling status checker (labkit!175 - merged) that there is a possibility the size of Gitaly binary increases after adding that sampling check. I cast the opentracing.Span to discrete structs. This may attach unexpected codes in the binary. Here's the comparison, both versions build with make all WITH_BUNDLED_GIT=YesPlease BUILD_TAGS="tracer_static tracer_static_jaeger" command.

  • Before this MR:

Screenshot_2023-03-09_at_20.27.53

  • This MR:

Screenshot_2023-03-09_at_20.31.51

The binary size indeed increases by around 5%. It looks like we need more work on this 👀

Edited by Quang-Minh Nguyen

Merge request reports