etcd and leader election in LavinMQ

LavinMQ has just launched version 2.0, bringing exciting new features! The highlight is clustering support, which enhances high availability.

The update introduces data replication, ensuring all nodes in the cluster maintain the same, up-to-date information. Once a LavinMQ leader is elected, it replicates log entries - such as published messages, acknowledge messages and metadata changes - to follower nodes. This replication process keeps all nodes synchronized, so even if a node fails, it can quickly catch up with the cluster once it recovers.

Leader election and node failure in LavinMQ

LavinMQ integrates with etcd. At CloudAMQP, we have configured one etcd node to connect with each LavinMQ node. etcd’s role is to manage leader elections for the cluster and keep a list of replicas in sync. When a LavinMQ node connects, it requests its role in the cluster, and that information is stored in etcd. While etcd’s key-value store tracks which LavinMQ node is the leader and maintains the replica list, LavinMQ itself handles data replication, ensuring consistency across the cluster.

etcd offers a user-friendly API for leader election and uses the Raft consensus algorithm for this process. With this foundation in place, let's dive into how leader election works.

Heartbeat and keep-alive signals in LavinMQ

When a leader node fails, whether due to a node-level issue (e.g., server crash) or a failure in LavinMQ at the leader node (e.g., software crash), etcd detects the failure through regular heartbeats and keep-alive signals. etcd uses heartbeats to monitor node health. LavinMQ leader clients request leases from etcd and send keep-alives to prove they are active. If the signals stop, the lease expires, and etcd recognizes a potential failure.

So, the current leader sends these signals to maintain their leadership status. If one of the signals stops, an etcd node recognizes the leader is down and initiates a new election to select a replacement.

If a follower doesn’t receive a signal within a specific timeout period, it assumes the leader has failed and transitions from being a follower to becoming a candidate. The candidate then initiates an election, asking other nodes to vote for it as the new leader. Each node can only vote for one candidate, ensuring only one leader is selected in the election.

To become the leader, a candidate must secure votes from a majority of the nodes. If successful, the candidate becomes the leader. If no candidate achieves a majority, the election process restarts.

When a LavinMQ follower node goes down and later rejoins the cluster, it needs to catch up with the leader's current state, even if it’s behind by millions of messages. To do this, the follower receives a snapshot and a list of logs from the leader, requests the missing logs, and updates its state. Once the follower is up-to-date, a final checksum ensures full synchronization. The leader briefly pauses operations during this final check to make sure everything is aligned, though this pause is usually minimal.

Log replication

After a LavinMQ leader is established, it takes charge of log replication to ensure all cluster nodes remain synchronized.

Log entries: The leader records client requests as log entries and transmits these entries to follower nodes.
Appending logs: Followers append these log entries to their logs and send an acknowledgment to the leader. The leader tracks which followers have successfully replicated the log entries and uses this information to determine how far behind each follower is.
Commitment: A log entry is considered committed by the follower when the follower has acknowledged the message.

If a client connects to a follower instead of the leader, the follower will forward the requests to the leader. This can cause a slight performance hit compared to connecting the client directly to the leader.

Summary

LavinMQ's new clustering feature, powered by etcd, significantly enhances high availability and fault tolerance. etcd primarily manages leader election, using heartbeats and keep-alive signals to monitor cluster status. Once elected, the LavinMQ leader handles log replication, keeping all nodes in sync - this ensures the cluster remains operational even if a node fails.

At CloudAMQP, we're thrilled to support LavinMQ 2.0 really soon! More detailed inforamtion about LavinMQ clustering can be found at the LavinMQ blog.