# 🚀 RabbitMQ Cluster on Kubernetes (Complete Setup + Troubleshooting Guide)
---
# 📌 Objective
Deploy a **3-node RabbitMQ Cluster** on Kubernetes with:
* High Availability
* Persistent Storage (NFS)
* Auto Clustering
* Management UI
* Application connectivity (Tomcat)
---
# 🏗️ Components Created
## 1. Persistent Volumes (NFS)
We created 3 PVs:
* pv-rabbitmq1
* pv-rabbitmq2
* pv-rabbitmq3
Each mapped to:
```text
/data/nfsshared/rabbitmq-pv1
/data/nfsshared/rabbitmq-pv2
/data/nfsshared/rabbitmq-pv3
```
Used:
```yaml
accessModes: ReadWriteOnce
```
👉 Ensures **1 pod = 1 storage**
---
## 2. ConfigMap
Contains:
### enabled_plugins
```erlang
[rabbitmq_management,rabbitmq_peer_discovery_k8s].
```
### rabbitmq.conf
```ini
cluster_formation.peer_discovery_backend = k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.service_name = service-rabbitmq-headless
cluster_formation.k8s.hostname_suffix = .service-rabbitmq-headless.default.svc.cluster.local
cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
queue_master_locator=min-masters
```
👉 Enables **auto clustering using Kubernetes**
---
## 3. RBAC (CRITICAL)
```yaml
ServiceAccount → rabbitmq
Role → access pods, endpoints
RoleBinding → bind both
```
👉 Required because:
```text
RabbitMQ calls Kubernetes API → needs permission
```
---
## 4. Headless Service
```yaml
name: service-rabbitmq-headless
clusterIP: None
publishNotReadyAddresses: true
```
👉 Enables DNS like:
```text
rabbitmq-0.service-rabbitmq-headless
```
---
## 5. NodePort Service (UI)
```yaml
port: 15672
nodePort: 30072
```
👉 Access UI:
```text
http://<NodeIP>:30072
```
---
## 6. ClusterIP Service (App)
```yaml
name: rabbitmq-svc
port: 5672
```
👉 Used by:
```text
Tomcat → rabbitmq-svc:5672
```
---
## 7. StatefulSet
Key points:
```yaml
serviceName: service-rabbitmq-headless
replicas: 3
```
### ENV:
```yaml
RABBITMQ_DEFAULT_USER=admin
RABBITMQ_DEFAULT_PASS=admin
RABBITMQ_ERLANG_COOKIE=mysecretcookie
RABBITMQ_USE_LONGNAME=true
```
### Volumes:
* PVC → /var/lib/rabbitmq
* ConfigMap → rabbitmq.conf + plugins
👉 Ensures:
* Stable identity
* Persistent data
* Config-driven clustering
---
# ⚙️ FINAL EXECUTION ORDER (VERY IMPORTANT)
👉 Always follow this order:
```bash
kubectl apply -f pv-rabbit-01.yaml
kubectl apply -f pv-rabbit-02.yaml
kubectl apply -f pv-rabbit-03.yaml
kubectl apply -f rbac-rabbitmq.yaml
kubectl apply -f configmap-rabbit.yaml
kubectl apply -f service-rabbitmq-headless.yaml
kubectl apply -f service-rabbitmq-svc.yaml
kubectl apply -f service-rabbitmq-nodeport.yaml
kubectl apply -f StatefulSet-rabbitmq.yaml
```
---
# 🔥 TROUBLESHOOTING JOURNEY
---
## ❌ Issue 1: DNS Not Working
Problem:
```text
rabbitmq-1 not resolving
```
Fix:
```yaml
publishNotReadyAddresses: true
```
---
## ❌ Issue 2: Service Name Mismatch
Problem:
```text
rabbitmq-headless vs service-rabbitmq-headless
```
Fix:
```text
Must match EXACTLY
```
---
## ❌ Issue 3: No rabbitmq.conf
Fix:
Added clustering config
---
## ❌ Issue 4: 403 Error (CRITICAL)
Log:
```text
Failed to fetch nodes from Kubernetes API: 403
```
Fix:
Added RBAC
---
## ❌ Issue 5: Short vs Long Names
Error:
```text
epmd nxdomain
```
Fix:
```yaml
RABBITMQ_USE_LONGNAME=true
```
---
## ❌ Issue 6: Cluster Join Failure
Error:
```text
tables_not_present
mnesia_not_running
```
👉 Root cause:
```text
Pods not ready at same time (timing issue)
```
---
## ❌ Issue 7: Cluster Not Forming
Final log:
```text
Starting as a blank standalone node
```
👉 Reason:
```text
Retry failed → node becomes standalone
```
---
# 🧠 WHY THIS HAPPENS
RabbitMQ:
```text
Cluster formation happens ONLY at startup
```
If peers not ready → join fails
---
# 🔧 FINAL FIXES APPLIED
* Enabled RBAC ✅
* Enabled longnames ✅
* Fixed serviceName ✅
* Fixed DNS ✅
* Added retry logic ✅
* Restarted pods cleanly ✅
---
# 📊 FINAL RESULT
```bash
rabbitmqctl cluster_status
```
Output:
```text
rabbit@rabbitmq-0
rabbit@rabbitmq-1
rabbit@rabbitmq-2
```
---
# 🎯 WHAT WE ACHIEVED
✅ 3-node RabbitMQ cluster
✅ Auto discovery via Kubernetes
✅ Persistent storage
✅ UI access
✅ App connectivity
✅ HA-ready setup
---
# ⚠️ ALTERNATIVES
| Approach | Result |
| -------------- | ------------------------ |
| No RBAC | No clustering ❌ |
| Manual join | Works but not stable ⚠️ |
| Classic config | Static, not scalable ❌ |
| Helm chart | Best production option ✅ |
---
# 🧠 FINAL LEARNING
* Kubernetes = dynamic → needs API
* RabbitMQ = startup-based clustering
* RBAC = mandatory
* Headless service = must
* Longnames = required
* Timing = critical
---
# 🚀 NEXT STEPS
* Create quorum queues
* Test failover (kill pod)
* Connect Tomcat producer/consumer
* Monitor cluster
---
# 📌 FINAL CONCLUSION
You successfully built a **production-grade RabbitMQ cluster on Kubernetes**
and solved real-world issues like:
* DNS
* RBAC
* Clustering
* Node naming
* Startup timing
---
==================================configmap-rabbit.yaml========================================
apiVersion: v1
kind: ConfigMap
metadata:
name: configmap-rabbit
labels:
type: configmap-rabbit
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s].
rabbitmq.conf: |
cluster_formation.peer_discovery_backend = k8s
cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.address_type = hostname
cluster_formation.k8s.service_name = service-rabbitmq-headless
cluster_formation.k8s.hostname_suffix = .service-rabbitmq-headless.default.svc.cluster.local
cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = true
cluster_partition_handling = autoheal
queue_master_locator=min-masters
==================================pv-rabbit-01.yaml========================================
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-rabbitmq1
labels:
type: pv-rabbitmq
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
mountOptions:
- sec=sys
- nfsvers=4.1
- hard
nfs:
server: controlnode
path: /data/nfsshared/rabbitmq-pv1
readOnly: false
==================================pv-rabbit-02.yaml========================================
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-rabbitmq2
labels:
type: pv-rabbitmq
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
mountOptions:
- sec=sys
- nfsvers=4.1
- hard
nfs:
server: controlnode
path: /data/nfsshared/rabbitmq-pv2
readOnly: false
==================================pv-rabbit-03.yaml========================================
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-rabbitmq3
labels:
type: pv-rabbitmq
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
mountOptions:
- sec=sys
- nfsvers=4.1
- hard
nfs:
server: controlnode
path: /data/nfsshared/rabbitmq-pv3
readOnly: false
==================================rbac-rabbitmq.yaml========================================
apiVersion: v1
kind: ServiceAccount
metadata:
name: rabbitmq
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rabbitmq
namespace: default
rules:
- apiGroups: [""]
resources:
- endpoints
- pods
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rabbitmq
namespace: default
subjects:
- kind: ServiceAccount
name: rabbitmq
namespace: default
roleRef:
kind: Role
name: rabbitmq
apiGroup: rbac.authorization.k8s.io
==================================service-rabbitmq-headless.yaml========================================
apiVersion: v1
kind: Service
metadata:
name: service-rabbitmq-headless
labels:
type: service-rabbitmq-headless
spec:
clusterIP: None
publishNotReadyAddresses: true
selector:
app: rabbitmq
ports:
- name: amqp
port: 5672
- name: management
port: 15672
- name: epmd
port: 4369
- name: cluster-rpc
port: 25672
==================================service-rabbitmq-nodeport.yaml========================================
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-nodeport
labels:
type: rabbitmq-nodeport
spec:
type: NodePort
selector:
app: rabbitmq
ports:
- name: management
port: 15672
targetPort: 15672
nodePort: 30072
==================================service-rabbitmq-svc.yaml========================================
apiVersion: v1
kind: Service
metadata:
name: rabbitmq-svc
labels:
type: rabbitmq-svc
spec:
type: ClusterIP
selector:
app: rabbitmq
ports:
- name: amqp
port: 5672
targetPort: 5672
==================================StatefulSet-rabbitmq.yaml========================================
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rabbitmq
labels:
type: rabbitmq
spec:
serviceName: service-rabbitmq-headless
replicas: 3
selector:
matchLabels:
app: rabbitmq
template:
metadata:
labels:
app: rabbitmq
spec:
serviceAccountName: rabbitmq
containers:
- name: rabbitmq
image: rabbitmq:3.12-management
ports:
- containerPort: 5672
- containerPort: 15672
env:
- name: RABBITMQ_DEFAULT_USER
value: "admin"
- name: RABBITMQ_DEFAULT_PASS
value: "admin"
- name: RABBITMQ_ERLANG_COOKIE
value: "mysecretcookie"
- name: RABBITMQ_USE_LONGNAME
value: "true"
volumeMounts:
- name: data
mountPath: /var/lib/rabbitmq
- name: config
mountPath: /etc/rabbitmq/enabled_plugins
subPath: enabled_plugins
- name: rabbitconf
mountPath: /etc/rabbitmq/rabbitmq.conf
subPath: rabbitmq.conf
volumes:
- name: config
configMap:
name: configmap-rabbit
- name: rabbitconf
configMap:
name: configmap-rabbit
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes:
- ReadWriteOnce
storageClassName: ""
resources:
requests:
storage: 1Gi
selector:
matchLabels:
type: pv-rabbitmq
# 👍 END