Skip to content

Trace2: Expose internal pack objects metrics with trace2

For #4699 (closed)

git-pack-objects is a hot topic recently. It's one of the core commands for pulling stuff from GitLab. We applied plenty of optimizations and solutions for this command. It makes sense to add some metrics to increase the observability. It may help when investigating performance issues.

This MR implement trace2 hook that exports pack-object metrics. This hook uses the existing trace2 hooking foundation. It hooks into the trace2 tree for the following events:

  • data:pack-objects:write_pack_file/wrote
  • data:pack-objects:loosen_unused_packed_objects/loosened
  • data:pack-objects:stdin_packs_found
  • data:pack-objects:stdin_packs_hints
  • pack-objects:enumerate-objects
  • pack-objects:prepare-pack
  • pack-objects:write-pack-file

In the source, I can find some more data, but I don't think they are relevant.

When run into such traces, the hook increases the corresponding Prometheus' histograms. In addition, it also dumps equivalent fields to command stats. Eventually, Gitaly outputs logs having associated fields.

Performance impact

The biggest concern is how this hook impacts the performance of git-pack-objects command. There are some data to justify its impact:

  • The output events are predictable. Unlike other commands, this command has a nearly fixed number of events. The full flow includes enumerating-objects, prepare-pack and write-pack-file steps. Each step has a handful of events. The total size of a typical event is around 3.4kb. I don't think writing and reading such small files will become a problem. The size of trace2 events doesn't increase according to size of pack files.
  • ´trace2.Parser is fast for this small amount of events. Testing on my local environment, it takes a couple of micro-seconds.
  • The execution rate of this command is slow in each node. In normal conditions, the execution rate is around 3-4 RPS in most busy nodes and may peak to 15 RPS in exceptional situations.

Of course, let's roll it out under feature flag control. We can always roll back if something goes wrong

Screenshot_2023-03-22_at_10.48.19 Execution rate of git-pack-objects commands on production - Source

Sample recorded events of git-pack-objects.

{"event":"version","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","time":"2023-02-22T12:05:04.840009Z","evt":"3","exe":"2.39.1"}
{"event":"start","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","t_abs":0.002338,"argv":["git","pack-objects","toon","--compression=0"]}
{"event":"def_repo","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"worktree":"/gitaly"}
{"event":"cmd_name","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","name":"pack-objects","hierarchy":"pack-objects"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","line":4460,"repo":1,"nesting":1,"category":"pack-objects","label":"enumerate-objects"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":2,"category":"progress","label":"Enumerating objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.001083,"nesting":2,"category":"progress","label":"Enumerating objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.001435,"nesting":1,"category":"pack-objects","label":"enumerate-objects"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":1,"category":"pack-objects","label":"prepare-pack"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":2,"category":"progress","label":"Counting objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.000075,"nesting":2,"category":"progress","label":"Counting objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.000090,"nesting":1,"category":"pack-objects","label":"prepare-pack"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":1,"category":"pack-objects","label":"write-pack-file"}
{"event":"region_enter","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"nesting":2,"category":"progress","label":"Writing objects"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.003972,"nesting":2,"category":"progress","label":"Writing objects"}
{"event":"data","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_abs":0.008451,"t_rel":0.003996,"nesting":2,"category":"pack-objects","key":"write_pack_file/wrote","value":"1"}
{"event":"region_leave","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_rel":0.004003,"nesting":1,"category":"pack-objects","label":"write-pack-file"}
{"event":"data","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","repo":1,"t_abs":0.008482,"t_rel":0.008482,"nesting":1,"category":"fsync","key":"fsync/writeout-only","value":"2"}
{"event":"exit","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","t_abs":0.008489,"code":0}
{"event":"atexit","sid":"1234/20230222T120504.839855Z-Ha0f0bee5-P0000ccc6","thread":"main","t_abs":0.008495,"code":0}

Result

Screenshot_2023-03-27_at_11.39.58

Edited by Quang-Minh Nguyen

Merge request reports