Improve Gitaly profiling by fixing zlib framepointers in omnibus-gitlab
Problem
On Gitaly hosts, during certain workloads a large percentage of CPU time is spent in git
processes, and those processes rely on a few third party libraries, such as zlib
.
However, the CPU time spent by git
in calls to helper library functions ends up being poorly accounted in profiles, due to those libraries being built without frame-pointer support. This breaks stack traces, so that we know only which library function was being called -- we cannot see why, because without the stack trace, we cannot see which git
subcommand and code path were making that call.
In the following example, 47% of git
CPU time (and 25% of the overall host CPU capacity) was spent in these opaque calls to zlib
. This flamegraph has correct traces for stacks that were fully in git
itself, but it has broken stack traces whenever git
was calling a library function. Those broken traces only show the top stack frame, rather than the full stack.
More details for this particular example are in the incident issue.
Solution
In this issue, we will fix zlib
tracing by adding the no-omit-frame-pointer
flag when building zlib
. This will improve the quality of Gitaly host profiles, giving full stack traces of git
processes that spend time reading or writing compressed data (e.g. packfiles).
Background
Fixing stack traces for git
calls into zlib
functions like inflate_fast
and adler32_z
will improve the quality of Gitaly profiles by fixing the most common remaining cause of broken stack traces in flamegraphs. Without this fix, we only get the top frame of stacks that were calling a zlib
library function, without the context of what part of git was making that call.
The git
binary and its library dependencies are built and shipped by omnibus-gitlab
:
The same problem (lack of frame-pointers) exists for all 3 libraries that git
dynamically links to. For now, we are only fixing zlib
, because it is called often enough to represent a significant percentage of CPU time for some workloads. We can later do the same for the other libraries if they turn out to be a practical problem.
Reference
Here are supporting details, showing that Gitaly's build of git
explicitly links to the omnibus libraries (rather than the generic version of the libraries that may be installed by other packages). Using libraries built and shipped by omnibus means this fix can be applied in the omnibus build configuration.
# Show the full path to the "git" binary we are using currently in production.
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ pgrep -a -f '/git .*cat-file' | awk '{ print $2 }' | sort | uniq -c
267 /var/opt/gitlab/gitaly/run/gitaly-1714157/git-exec-663879678.d/git
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ sudo realpath /var/opt/gitlab/gitaly/run/gitaly-1714157/git-exec-663879678.d/git
/opt/gitlab/embedded/bin/gitaly-git-v2.35.1.gl1
# Show that this "git" binary comes from our omnibus package ("gitlab-ee").
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ dpkg-query -S /opt/gitlab/embedded/bin/gitaly-git-v2.35.1.gl1
gitlab-ee: /opt/gitlab/embedded/bin/gitaly-git-v2.35.1.gl1
# Crucially, the omnibus package also provides some of the "git" binary's dynamic dependencies, including the zlib dependency where git sometimes spends a lot of CPU time.
# This means we have control over how it is built and can enable frame-pointers.
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ ldd /opt/gitlab/embedded/bin/gitaly-git-v2.35.1.gl1 | grep 'gitlab'
libpcre2-8.so.0 => /opt/gitlab/embedded/lib/libpcre2-8.so.0 (0x00007f9b10db0000)
libz.so.1 => /opt/gitlab/embedded/lib/libz.so.1 (0x00007f9b10d92000)
libiconv.so.2 => /opt/gitlab/embedded/lib/libiconv.so.2 (0x00007f9b10ca7000)
# The zlib library already includes symbols. So if we enable frame-pointers, we can get useful stack traces from it.
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ file /opt/gitlab/embedded/lib/libz.so.1
/opt/gitlab/embedded/lib/libz.so.1: symbolic link to libz.so.1.2.12
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ file $( realpath /opt/gitlab/embedded/lib/libz.so.1 )
/opt/gitlab/embedded/lib/libz.so.1.2.12: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=06678e82cb518542770f8cc3d074350b6607e6a3, not stripped
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ nm /opt/gitlab/embedded/lib/libz.so.1 | wc -l
199
Testing for frame-pointer support
And for reference, here is a litmus test I created a while ago for inferring whether or not an x86-64 binary was built with frame-pointer support.
If frame-pointers are supported, then most functions should begin by pushing the previous frame's base-pointer onto the stack and then copying the current top-of-stack address into the base-pointer register. There are exceptions, but most functions should do this.
That 2nd step of copying RSP to RBP is a pretty distinctive instruction. Here we count how often that instruction occurs in the disassembled binary. If the count is significantly smaller than the number of traceable functions, then the compiler probably subverted frame-pointer support, instead treating register RBP as a general-purpose register (and consequently breaking fp-based stack traces).
# Show that the zlib binary is not providing frame-pointers.
msmiley@file-cny-01-stor-gprd.c.gitlab-production.internal:~$ objdump --disassemble /opt/gitlab/embedded/lib/libz.so.1 | grep -c 'mov *%rsp,%rbp'
1