RabbitMQ pod-0 doesn't join the cluster

Summary

Since RabbitMQ 4.1 the Kubernetes Peer Discovery changed to always join pod-0 and pod-0 always start directly[1]. This will create a second cluster if you start pod-0 new and the PVC got deleted (e.g. due to redeploy of node)

Should we implement the "old" discover mechanism and check if there are other pods, than join pod-0 to the cluster?

Detailed Description

The blog entry[2] but also a issue at rabbitmq cluster-operator[3] stated, that this is more or less expected and one should manual join the node (or not delete the data - but that's against the yaook way to redeploy everything...)

Steps to reproduce the issue

  1. Delete the PVC of MQ pod-0
  2. Delete/Restart pod-0
  3. May needed: run rabbitmqctl forget_cluster_node rabbit@$node
  4. Wait till the node comes up
  5. Check rabbitmqctl cluster_status on pod-0 and any other MQ pod

Result

The pod-0 started it's own cluster and didn't joined the existing cluster. Queues are not synced to pod-0. Probably restarting other MQ pods fails, as their queues would be out of quorum than.

Expected Result

Pod-0 should join the cluster?

Additional Information

[1] https://www.rabbitmq.com/docs/cluster-formation#kubernetes-peer-discovery-overview

[2] https://www.rabbitmq.com/blog/2025/04/04/new-k8s-peer-discovery#kubernetes-peer-discovery-in-rabbitmq-41

[3] https://github.com/rabbitmq/cluster-operator/issues/1957#issuecomment-3304012484

Resolution

  1. Adjust the MQ sts to check if there are other pods, than join the node.
  2. Don't join automatically but build tooling (e.g. in yaookctl) to join the node again.

Proposal

To be discussed.

Specification

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this issue are to be interpreted in the spirit of RFC 2119, even though we're not technically doing protocol design.

Edited Feb 03, 2026 by Stefan Hoffmann
Assignee Loading
Time tracking Loading