abhilashthale.tech
  • Home
  • BlogCategories
    • coding
    • other
    • n
  • Images to Pdf
  • My Files
  • Shares Average
  • About Me
  • Server Stats
  • Day
  • Night
  • Birds
  • Waves
  • Net
  • Dots
  • Halo
  • Rings
  • Fog
  • Clouds

    Rabbitmq Cluster On Kubernetes (Complete Setup + Troubleshooting Guide)

    by abhilashthale - April 26, 2026

    # 🚀 RabbitMQ Cluster on Kubernetes (Complete Setup + Troubleshooting Guide)

    ---

    # 📌 Objective

    Deploy a **3-node RabbitMQ Cluster** on Kubernetes with:

    * High Availability
    * Persistent Storage (NFS)
    * Auto Clustering
    * Management UI
    * Application connectivity (Tomcat)

    ---

    # 🏗️ Components Created

    ## 1. Persistent Volumes (NFS)

    We created 3 PVs:

    * pv-rabbitmq1
    * pv-rabbitmq2
    * pv-rabbitmq3

    Each mapped to:

    ```text
    /data/nfsshared/rabbitmq-pv1
    /data/nfsshared/rabbitmq-pv2
    /data/nfsshared/rabbitmq-pv3
    ```

    Used:

    ```yaml
    accessModes: ReadWriteOnce
    ```

    👉 Ensures **1 pod = 1 storage**

    ---

    ## 2. ConfigMap

    Contains:

    ### enabled_plugins

    ```erlang
    [rabbitmq_management,rabbitmq_peer_discovery_k8s].
    ```

    ### rabbitmq.conf

    ```ini
    cluster_formation.peer_discovery_backend = k8s
    cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
    cluster_formation.k8s.address_type = hostname
    cluster_formation.k8s.service_name = service-rabbitmq-headless
    cluster_formation.k8s.hostname_suffix = .service-rabbitmq-headless.default.svc.cluster.local
    cluster_formation.node_cleanup.interval = 10
    cluster_formation.node_cleanup.only_log_warning = true
    cluster_partition_handling = autoheal
    queue_master_locator=min-masters
    ```

    👉 Enables **auto clustering using Kubernetes**

    ---

    ## 3. RBAC (CRITICAL)

    ```yaml
    ServiceAccount → rabbitmq
    Role → access pods, endpoints
    RoleBinding → bind both
    ```

    👉 Required because:

    ```text
    RabbitMQ calls Kubernetes API → needs permission
    ```

    ---

    ## 4. Headless Service

    ```yaml
    name: service-rabbitmq-headless
    clusterIP: None
    publishNotReadyAddresses: true
    ```

    👉 Enables DNS like:

    ```text
    rabbitmq-0.service-rabbitmq-headless
    ```

    ---

    ## 5. NodePort Service (UI)

    ```yaml
    port: 15672
    nodePort: 30072
    ```

    👉 Access UI:

    ```text
    http://<NodeIP>:30072
    ```

    ---

    ## 6. ClusterIP Service (App)

    ```yaml
    name: rabbitmq-svc
    port: 5672
    ```

    👉 Used by:

    ```text
    Tomcat → rabbitmq-svc:5672
    ```

    ---

    ## 7. StatefulSet

    Key points:

    ```yaml
    serviceName: service-rabbitmq-headless
    replicas: 3
    ```

    ### ENV:

    ```yaml
    RABBITMQ_DEFAULT_USER=admin
    RABBITMQ_DEFAULT_PASS=admin
    RABBITMQ_ERLANG_COOKIE=mysecretcookie
    RABBITMQ_USE_LONGNAME=true
    ```

    ### Volumes:

    * PVC → /var/lib/rabbitmq
    * ConfigMap → rabbitmq.conf + plugins

    👉 Ensures:

    * Stable identity
    * Persistent data
    * Config-driven clustering

    ---

    # ⚙️ FINAL EXECUTION ORDER (VERY IMPORTANT)

    👉 Always follow this order:

    ```bash
    kubectl apply -f pv-rabbit-01.yaml
    kubectl apply -f pv-rabbit-02.yaml
    kubectl apply -f pv-rabbit-03.yaml

    kubectl apply -f rbac-rabbitmq.yaml

    kubectl apply -f configmap-rabbit.yaml

    kubectl apply -f service-rabbitmq-headless.yaml
    kubectl apply -f service-rabbitmq-svc.yaml
    kubectl apply -f service-rabbitmq-nodeport.yaml

    kubectl apply -f StatefulSet-rabbitmq.yaml
    ```

    ---

    # 🔥 TROUBLESHOOTING JOURNEY

    ---

    ## ❌ Issue 1: DNS Not Working

    Problem:

    ```text
    rabbitmq-1 not resolving
    ```

    Fix:

    ```yaml
    publishNotReadyAddresses: true
    ```

    ---

    ## ❌ Issue 2: Service Name Mismatch

    Problem:

    ```text
    rabbitmq-headless vs service-rabbitmq-headless
    ```

    Fix:

    ```text
    Must match EXACTLY
    ```

    ---

    ## ❌ Issue 3: No rabbitmq.conf

    Fix:
    Added clustering config

    ---

    ## ❌ Issue 4: 403 Error (CRITICAL)

    Log:

    ```text
    Failed to fetch nodes from Kubernetes API: 403
    ```

    Fix:
    Added RBAC

    ---

    ## ❌ Issue 5: Short vs Long Names

    Error:

    ```text
    epmd nxdomain
    ```

    Fix:

    ```yaml
    RABBITMQ_USE_LONGNAME=true
    ```

    ---

    ## ❌ Issue 6: Cluster Join Failure

    Error:

    ```text
    tables_not_present
    mnesia_not_running
    ```

    👉 Root cause:

    ```text
    Pods not ready at same time (timing issue)
    ```

    ---

    ## ❌ Issue 7: Cluster Not Forming

    Final log:

    ```text
    Starting as a blank standalone node
    ```

    👉 Reason:

    ```text
    Retry failed → node becomes standalone
    ```

    ---

    # 🧠 WHY THIS HAPPENS

    RabbitMQ:

    ```text
    Cluster formation happens ONLY at startup
    ```

    If peers not ready → join fails

    ---

    # 🔧 FINAL FIXES APPLIED

    * Enabled RBAC ✅
    * Enabled longnames ✅
    * Fixed serviceName ✅
    * Fixed DNS ✅
    * Added retry logic ✅
    * Restarted pods cleanly ✅

    ---

    # 📊 FINAL RESULT

    ```bash
    rabbitmqctl cluster_status
    ```

    Output:

    ```text
    rabbit@rabbitmq-0
    rabbit@rabbitmq-1
    rabbit@rabbitmq-2
    ```

    ---

    # 🎯 WHAT WE ACHIEVED

    ✅ 3-node RabbitMQ cluster
    ✅ Auto discovery via Kubernetes
    ✅ Persistent storage
    ✅ UI access
    ✅ App connectivity
    ✅ HA-ready setup

    ---

    # ⚠️ ALTERNATIVES

    | Approach       | Result                   |
    | -------------- | ------------------------ |
    | No RBAC        | No clustering ❌          |
    | Manual join    | Works but not stable ⚠️  |
    | Classic config | Static, not scalable ❌   |
    | Helm chart     | Best production option ✅ |

    ---

    # 🧠 FINAL LEARNING

    * Kubernetes = dynamic → needs API
    * RabbitMQ = startup-based clustering
    * RBAC = mandatory
    * Headless service = must
    * Longnames = required
    * Timing = critical

    ---

    # 🚀 NEXT STEPS

    * Create quorum queues
    * Test failover (kill pod)
    * Connect Tomcat producer/consumer
    * Monitor cluster

    ---

    # 📌 FINAL CONCLUSION

    You successfully built a **production-grade RabbitMQ cluster on Kubernetes**
    and solved real-world issues like:

    * DNS
    * RBAC
    * Clustering
    * Node naming
    * Startup timing

    ---

    ==================================configmap-rabbit.yaml========================================
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: configmap-rabbit
      labels:
        type: configmap-rabbit
    data:

      enabled_plugins: |
        [rabbitmq_management,rabbitmq_peer_discovery_k8s].

      rabbitmq.conf: |
        cluster_formation.peer_discovery_backend = k8s
        cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
        cluster_formation.k8s.address_type = hostname
        cluster_formation.k8s.service_name = service-rabbitmq-headless
        cluster_formation.k8s.hostname_suffix = .service-rabbitmq-headless.default.svc.cluster.local
        cluster_formation.node_cleanup.interval = 10
        cluster_formation.node_cleanup.only_log_warning = true
        cluster_partition_handling = autoheal
        queue_master_locator=min-masters
    ==================================pv-rabbit-01.yaml========================================
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-rabbitmq1
      labels:
        type: pv-rabbitmq
    spec:
      capacity:
        storage: 1Gi
      accessModes:
        - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain

      mountOptions:
        - sec=sys
        - nfsvers=4.1
        - hard
      nfs:
        server: controlnode
        path: /data/nfsshared/rabbitmq-pv1
        readOnly: false

    ==================================pv-rabbit-02.yaml========================================
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-rabbitmq2
      labels:
        type: pv-rabbitmq
    spec:
      capacity:
        storage: 1Gi
      accessModes:
        - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain

      mountOptions:
        - sec=sys
        - nfsvers=4.1
        - hard
      nfs:
        server: controlnode
        path: /data/nfsshared/rabbitmq-pv2
        readOnly: false

    ==================================pv-rabbit-03.yaml========================================
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-rabbitmq3
      labels:
        type: pv-rabbitmq
    spec:
      capacity:
        storage: 1Gi
      accessModes:
        - ReadWriteOnce
      persistentVolumeReclaimPolicy: Retain

      mountOptions:
        - sec=sys
        - nfsvers=4.1
        - hard
      nfs:
        server: controlnode
        path: /data/nfsshared/rabbitmq-pv3
        readOnly: false

    ==================================rbac-rabbitmq.yaml========================================
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: rabbitmq
      namespace: default

    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: rabbitmq
      namespace: default
    rules:
      - apiGroups: [""]
        resources:
          - endpoints
          - pods
        verbs:
          - get
          - list
          - watch

    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: rabbitmq
      namespace: default
    subjects:
      - kind: ServiceAccount
        name: rabbitmq
        namespace: default
    roleRef:
      kind: Role
      name: rabbitmq
      apiGroup: rbac.authorization.k8s.io
    ==================================service-rabbitmq-headless.yaml========================================
    apiVersion: v1
    kind: Service
    metadata:
      name: service-rabbitmq-headless
      labels:
        type: service-rabbitmq-headless
    spec:
      clusterIP: None
      publishNotReadyAddresses: true
      selector:
        app: rabbitmq
      ports:
        - name: amqp
          port: 5672
        - name: management
          port: 15672
        - name: epmd
          port: 4369
        - name: cluster-rpc
          port: 25672
    ==================================service-rabbitmq-nodeport.yaml========================================
    apiVersion: v1
    kind: Service
    metadata:
      name: rabbitmq-nodeport
      labels:
        type: rabbitmq-nodeport
    spec:
      type: NodePort
      selector:
        app: rabbitmq
      ports:
        - name: management
          port: 15672
          targetPort: 15672
          nodePort: 30072


    ==================================service-rabbitmq-svc.yaml========================================
    apiVersion: v1
    kind: Service
    metadata:
      name: rabbitmq-svc
      labels:
        type: rabbitmq-svc
    spec:
      type: ClusterIP
      selector:
        app: rabbitmq
      ports:
        - name: amqp
          port: 5672
          targetPort: 5672

    ==================================StatefulSet-rabbitmq.yaml========================================
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: rabbitmq
      labels:
        type: rabbitmq
    spec:
      serviceName: service-rabbitmq-headless
      replicas: 3
      selector:
        matchLabels:
          app: rabbitmq
      template:
        metadata:
          labels:
            app: rabbitmq
        spec:
          serviceAccountName: rabbitmq
          containers:
            - name: rabbitmq
              image: rabbitmq:3.12-management
              ports:
                - containerPort: 5672
                - containerPort: 15672
              env:
                - name: RABBITMQ_DEFAULT_USER
                  value: "admin"
                - name: RABBITMQ_DEFAULT_PASS
                  value: "admin"
                - name: RABBITMQ_ERLANG_COOKIE
                  value: "mysecretcookie"
                - name: RABBITMQ_USE_LONGNAME
                  value: "true"
              volumeMounts:
                - name: data
                  mountPath: /var/lib/rabbitmq
                - name: config
                  mountPath: /etc/rabbitmq/enabled_plugins
                  subPath: enabled_plugins
                - name: rabbitconf
                  mountPath: /etc/rabbitmq/rabbitmq.conf
                  subPath: rabbitmq.conf

          volumes:
            - name: config
              configMap:
                name: configmap-rabbit
            - name: rabbitconf
              configMap:
                name: configmap-rabbit


      volumeClaimTemplates:
        - metadata:
            name: data
          spec:
            accessModes:
              - ReadWriteOnce
            storageClassName: ""
            resources:
              requests:
                storage: 1Gi
            selector:
              matchLabels:
                type: pv-rabbitmq


    # 👍 END
     

abhilashthale.tech