Record and track application boot time
What does this MR do and why?
We are looking to improve Rails startup time. In order to measure progress, we need a baseline for comparison. To that end, this MR adds a mechanism to track the time it takes for the primary Rails process to start up.
To understand what application boot time even means, I summarized below how I approached this.
Implementation
What hook can we use to decide that the app is fully booted?
In a pre-fork server like Puma, the primary process runs through rackup, boots Rails, pre-loads classes, then forks into worker processes. In this case, we are only interested in the boot time for the primary process since forking is comparatively cheap and will not yield useful results.
In single-process environments like Sidekiq, but also the Rails console, for instance, we are largely interested in the point in time when the last initializer ran.
For both we can leverage the on_master_start
life-cycle hook. In a single-proc environment, this will run immediately when called, i.e. during the last initializer. In a pre-fork env, it will run when the primary has fully started.
What point in time can we use as t0?
We need to decide what is the "time zero" event from which onward we track the time elapsed until the "boot finished" event occurs. This is not something we can easily do in the app itself. Instead, we can leverage the Linux kernel stats to tell us how much time elapsed since the process started. Linux maintains system and process statistics in procfs
; particularly the /proc/uptime
and /proc/self/stat
files. The former is simply the system uptime in seconds. The latter is trickier; this file contains an array that in turn contains the starttime
field for this process:
(22) starttime %llu
The time the process started after system boot. In kernels before Linux 2.6, this value was expressed in jiffies. Since Linux 2.6, the value is expressed in clock ticks (divide by sysconf(_SC_CLK_TCK)).
The tricky bit here is that:
- This is a timestamp, i.e. a fixed value that does not move every time we look at it.
- The unit is clock ticks, not wall time.
So what we need to do is:
- Take this value and divide it by the system's clock ticks (which is measured in ticks per second), so we have
elapsed_ticks / clock_ticks / second
which yields a value measured in seconds. - Subtract it from
uptime
, which is already in seconds.
I found an implementation of this in the linux_stat
gem, but it uses a C extension that does not compile on macOS (procfs
does not exist on Darwin), which many GitLab contributors use. This is unfortunate, since there is a lot of utility in this gem.
Where is it exposed?
I expose the boot time in 3 places for maximum visibility:
- In the Rails console
- In the application logs (structured JSON)
- In prometheus via the
gitlab_rails_boot_time_seconds
gauge
Feature flag
We ended up putting this behind the track_application_boot_time
feature flag, which is on
by default. The reason being that this code path is only evaluated during startup of the primary process, so any change in the FF state would require a restart, i.e. turning it on is not equivalent to it activating.
Since we have confidence in the change from testing, I decided to make it default ON, so that we can turn it off in case a container crash-loops, unblocking a deployment.
This would never reach users in a broken state, since it would prevent the Rails process from starting.
Test coverage
Because I touched console_message
, undercoverage
started to complain about a lack of test coverage. So I took the opportunity to split this into two pieces:
- An initializer that merely patches IRB as it did before
- A new module
Gitlab::Console
that contains the actual logic
The latter is now covered with tests for CE and EE.
Screenshots or screen recordings
Console boot time message
Note: This will only be displayed on Linux.
Prometheus metrics
You will notice that the time for workers is always 0; this is by design, since workers reset their metrics after forking, and this metrics wouldn't make a lot of sense in a pre-fork env anyway (see above.)
How to set up and validate locally
NOTE: The current implementation will not work on macOS, since it requires access to procfs
and Ruby constants that are only present on supported GitLab platforms (that means Linux.) On macOS, application boot time will be 0
.
You can run this in a Docker container instead if you're on Darwin. Just run the Rails console, boot Puma or Sidekiq and inspect the application logs or Prometheus.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #321973 (closed)