Enable Profile Guided Optimizations on GitLab's Go services: Gitaly, Workhorse, Pages, Container Registry etc

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Background

What is Profile-guided optimization (PGO)?

From https://go.dev/doc/pgo

Profile-guided optimization (PGO), also known as feedback-directed optimization (FDO), is a compiler optimization technique that feeds information (a profile) from representative runs of the application back into to the compiler for the next build of the application, which uses that information to make more informed optimization decisions. For example, the compiler may decide to more aggressively inline functions which the profile indicates are called frequently.

In Go, the compiler uses CPU pprof profiles as the input profile, such as from runtime/pprof or net/http/pprof.

As of Go 1.21, benchmarks for a representative set of Go programs show that building with PGO improves performance by around 2-7%. We expect performance gains to generally increase over time as additional optimizations take advantage of PGO in future versions of Go.

(emphasis mine)

Proposal

Enhance the compilation of GitLab's Go services using Profile-guided optimization.

GitLab already performs continuous profiling on most Golang services, using Google Cloud Profiler - see https://console.cloud.google.com/profiler/gitaly/cpu?project=gitlab-production for continuous profile data for Gitaly, for example:

screenshot-andrewn-2023-09-06T14h55Z_2x

LabKit already exposes pprof endpoints for Go applications. It might be possible to collect this data from production and feed it back into the compilation process using the -pgo option in the Go toolchain.

Alternatively, we could collect this data from performance benchmarking and use it in a similar way.

Implementation

Google Cloud Profiler is able to export pprof data from the continuous profiling data. This is very handy since it provides a easy way to read aggregated pprof data from across all nodes in the production cluster.

This can be done using profile downloads: https://cloud.google.com/profiler/docs/downloading-profiles

At present, there doesn't appear to be an API for this, but it might be worth asking Google about whether it would be possible to obtain them via API.

cc @igorwwwwwwwwwwwwwwwwwwww @stanhu

Edited by 🤖 GitLab Bot 🤖