GitLab Pages perf profiles consistently support safe and correct symbol resolution
Problem
Under certain circumstances, perf
profiling data can be incomplete or incorrect.
How are we fixing it?
Add GNU build-ids to each of the go binaries built by GitLab Pages.
Background
This issue addresses one of the most common causes of those defects for Go binaries. Go binaries are particularly prone to this, as their default build-id format is incompatible with perf
, and that implicitly inhibits perf's strongest protection mechanisms against this problem.
Running perf
profiles happens in 2 phases: capture and analysis. The 1st phase (perf record
) captures raw stack trace data, and the 2nd phase (e.g. perf script
, perf top
, etc.) does symbol resolution, finding names for the functions in the stack traces. If the binary that was traced during the 1st phase gets deleted or replaced before the 2nd phase (which could be moments or days later), then symbol resolution can fail, producing either no results or incorrect results.
We can prevent that risk by adding a unique GNU build-id to each binary. This allows the 1st phase (perf record
) to associate the stack traces with a distinctive id specific to that binary, and it captures a copy of that binary in its build cache. This makes the 2nd phase more reliable in 2 ways:
- Symbols not missing: Because the correct binary has been copied into the build cache, any symbols it contains are available for use at any time, even if the binary was later deleted or replaced. (There are also other ways for symbols to be missing, but this handles some important cases: frequent deploys, reanalyzing profiles captured during an incident, etc.)
- Symbols are correct: Because the captured stack traces are tagged with a specific build-id, even if a different build of the binary has been deployed to the original binary's path, perf will recognize that the new binary is not the one that was profiled and therefore will not use its (incorrect) symbol tables. Instead, it will use the cached binary, which has correct symbols. Even if the cache is cleared, perf will refuse to use the incorrect symbols from the newer binary, so in all cases we avoid incorrect symbol resolution.
See the epic description for more background: &666 (closed)