Dump trace2 events of a user/project using feature flag
A replacement for #4808 (closed)
Problem
After we integrate trace2, we have a more profound look into git's internal process. Of course, it comes with a cost: overhead. During the way, the process creates span objects. After done, the library sends the traces to the collector. It's a giant burden for both the Gitaly process and the Collector if we enable tracing for all RPC requests. At the moment, we are sampling the request at 0.1% rate.
This approach works fine on dot com or similar environments with a certain number of requests. A less-busy RPC might not have enough traffic to issue any trace. A self-managed instance faces the same low-traffic situation. In addition, many customers don't set up tracing for their instance. Even though they do, it's not very convenient for them to share a trace with us. Thus, Trace2 hasn't been utilized that much from the Gitaly server. Some sample issues in which Trace2 can help the investigation a lot:
- Local git clone sizes started growing massively... (#5688)
- https://gitlab.com/gitlab-org/gitaly/-/issues/5680+
This issue proposes a method for a customer or for a GitLab team member to collect Trace2 events for a particular user/project.
Solution
The idea is simple. A folk can turn on an operational feature flag for a user or project, for example:
/chatops run feature set gitaly_dangerous_force_collect_traces true --user=qmnguyen0711
/chatops run feature set gitaly_dangerous_force_collect_traces true --project=gitlab-org/gitlab
A self-managed instance can enable the feature flag via Rails console (doc). After the flag is enabled, Gitaly starts to dump the events into logs.
Some factors to consider:
- There are some overheads added to each Git process. Although Trace2 is fast, the overhead is not free. As a result, we can consider adding a mechanism to limit the trace2 invocation, in case the flag is enabled globally.
- The format of the output events should be picked carefully.
- Consider adding a section about this feature to https://docs.gitlab.com/ee/administration/gitaly/monitoring.html or similar place.