Skip to content

Initialisers that reconfigure working environment should use LifecycleEvents

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

This issue and Sidekiq cluster should preload before forking should be in the same milestone.

Problem to solve

Currently our initialisers do a number of different operations to initialise application:

  1. Mocking/mixing existing classes
  2. Configuring Gems, like: ActsAsTaggableOn.tags_counter = false
  3. Based on execution context (Puma, Sidekiq) reconfigure connection pools
  4. We connect to DB to read some application specific settings, like features and act on them

The problematic are 3. and 4., as the nature of process preloading is that the preload is a cheap way to share the memory pages across forked processes. The problems becomes if we might unintentionally open connections (sockets) or spawn threads as part of initialisation process.

Probably we could then distinguish two types of initialisations:

  1. Primitive: mocking/mixing existing and configuring gems
  2. Expensive: evaluating execution context, reconfiguring and reading external configuration.

The problem is the structural difference between primitive and expensive type of initialisation:

  1. The primitive rewrites existing page, but effectively we can assume that is stable and does not change.
  2. The expensive requires to create ephemeral (usually) connection and very context based variability in memory pages. This does not work very well with forking, as: unless socket is marked as SOCK_CLOEXEC (Puma also iterates all open FDs to explicitly close them when forking), threads being fired we might end-up with orphaned objects, that consume memory, and that will never be freed.

Proposal

My proposal is to distinguish the primitive and expensive type of initialisers and properly execute them.

  1. primitive can be run always
  2. primitive initialisation cannot do process discovery (whether it is running Puma or Unicorn or Sidekiq)
  3. expensive can be run only as part of lifecycle events the on_master_start or on_worker_start, and can open connections, spawn threads, or do process discovery
  4. ideally the initialisers should disallow creating new threads or opening connections when in primitive operation
  5. we do cleanup resources after running expensive initialisation

This would allow us to better model the application preloads and execute relevant operations with much greater control.

How it would affect you?

Not really much. You would only have to put the expensive type into a lifecycle event as part of config/initializers:

Gitlab::Cluster::LifecycleEvents.on_master_start do
  # do it here, the expensive initialisation
end

This would be executed at the appropriate time.

Context

I noticed this kind of problem when looking at a problem of improving memory efficiency of Puma and Sidekiq:

Edited by 🤖 GitLab Bot 🤖