Optimizing Rabbitmq CPU scheduling
Summary
In a normal yaook openstack deployment you will be running at least 3 rabbitmq clusters (cinder, nova and neutron). Based on our current labeling strategy it is quite probable that these 3 clusters share a common pool of hardware (e.g. 3 nodes). However the current cpu scheduling strategy within erlang does not work well if multiple erlang processes run on the same host.
Use cases
I would like to run all 3 rabbitmq clusters under load and not have them break regularly.
Details about the erlang scheduling
The following is a summary of https://www.erlang.org/doc/man/erl.html#+sbt.
The official rabbitmq image sets +stbt db as a launch option of erlang.
This option defines how the erlang scheduler threads (the threads actually running the erlang code) are distributed over the phyiscal cpu cores.
Per default erlang spawns one scheduler thread per logical core (so hyperthread or phyiscal core depending on if you have hyperthreading enabled).
The setting db translates to the setting thread_no_node_processor_spread which basically means that each scheduler thread is pinned to one specific logical core in a predetermined order.
There are some more details to this specific setting, but they are not relevant for this issue (if interested please see the docs).
When possible the erlang runtime tries to run processes on the lowest possible scheduler id. As the scheduler id consistently mapping to a logical core this means that if possible all 3 rabbitmq processes try to run code on the same logical core.
The erlang documentation already metions that this can cause performance issues:
If the Erlang runtime system is the only operating system process that binds threads to logical processors, this improves the performance of the runtime system. However, if other operating system processes (for example another Erlang runtime system) also bind threads to logical processors, there can be a performance penalty instead. This performance penalty can sometimes be severe. If so, you are advised not to bind the schedulers.
Proposal
There are two options for solving this issue:
Option 1: Disable pinning of cpus
The erlang documentation already mentions that if we run multiple erlang processes we should not bind the schedulers to logical cores.
For this we could set the environment variable RABBITMQ_SCHEDULER_BIND_TYPE to u.
Then the scheduling of the erlang schedulers to logical cores is done by the linux kernel.
Option 2: Pin different erlang processes to different physical cores
We could leave the pinning of erlang schedulers enabled by ensuring that the different erlang processes/pods are not using the same physical cores.
To do this we could set requests and limits on the rabbitmq pods and set cpuManagerPolicy: static on the nodes running rabbitmq. (Docs).
However this would require rabbitmq pods to run on dedicated nodes and custom configuration on these nodes.
Proposal
I would propose going with Option 1 as the implementation is a lot simpler and might already be enough to solve our issues.
Specification
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this issue are to be interpreted in the spirit of RFC 2119, even though we're not technically doing protocol design.