Initialisers that reconfigure working environment should use LifecycleEvents

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

This issue and Sidekiq cluster should preload before forking should be in the same milestone.

Problem to solve

Currently our initialisers do a number of different operations to initialise application:

Mocking/mixing existing classes
Configuring Gems, like: ActsAsTaggableOn.tags_counter = false
Based on execution context (Puma, Sidekiq) reconfigure connection pools
We connect to DB to read some application specific settings, like features and act on them

The problematic are 3. and 4., as the nature of process preloading is that the preload is a cheap way to share the memory pages across forked processes. The problems becomes if we might unintentionally open connections (sockets) or spawn threads as part of initialisation process.

Probably we could then distinguish two types of initialisations:

Primitive: mocking/mixing existing and configuring gems
Expensive: evaluating execution context, reconfiguring and reading external configuration.

The problem is the structural difference between primitive and expensive type of initialisation:

The primitive rewrites existing page, but effectively we can assume that is stable and does not change.
The expensive requires to create ephemeral (usually) connection and very context based variability in memory pages. This does not work very well with forking, as: unless socket is marked as SOCK_CLOEXEC (Puma also iterates all open FDs to explicitly close them when forking), threads being fired we might end-up with orphaned objects, that consume memory, and that will never be freed.

Proposal

My proposal is to distinguish the primitive and expensive type of initialisers and properly execute them.

primitive can be run always
primitive initialisation cannot do process discovery (whether it is running Puma or Unicorn or Sidekiq)
expensive can be run only as part of lifecycle events the on_master_start or on_worker_start, and can open connections, spawn threads, or do process discovery
ideally the initialisers should disallow creating new threads or opening connections when in primitive operation
we do cleanup resources after running expensive initialisation

This would allow us to better model the application preloads and execute relevant operations with much greater control.

How it would affect you?

Not really much. You would only have to put the expensive type into a lifecycle event as part of config/initializers:

Gitlab::Cluster::LifecycleEvents.on_master_start do
  # do it here, the expensive initialisation
end

This would be executed at the appropriate time.

Context

I noticed this kind of problem when looking at a problem of improving memory efficiency of Puma and Sidekiq:

Preload sidekiq in sidekiq-cluster mode: #215317
Puma web and actioncable concurrently: #214788 (comment 328809780)
Puma preload PoC: !30144 (closed)

Edited Jul 06, 2025 by 🤖 GitLab Bot 🤖