๐ค AI Summary
Existing Raft reconfiguration schemes rely on a centralized coordinator and require full-cluster downtime, introducing single-point-of-failure vulnerabilities and correctness risks. This paper proposes ReCraft: a coordinator-free dynamic reconfiguration mechanism supporting split/merge operations and fine-grained membership changes. Its core is a self-contained, multi-level reconfiguration protocol, realized through extensions to the Raft state machine and a redesigned distributed consensus logicโformally verified for safety and liveness using TLA+. Implemented in etcd, ReCraft demonstrates that reconfiguration blocks only necessary log submissions, incurs <8% throughput degradation, reduces split/merge latency by 57%, and completely eliminates single-point failures inherent in centralized coordination.
๐ Abstract
Designing reconfiguration schemes for consensus protocols is challenging because subtle corner cases during reconfiguration could invalidate the correctness of the protocol. Thus, most systems that embed consensus protocols conservatively implement the reconfiguration and refrain from developing an efficient scheme. Existing implementations often stop the entire system during reconfiguration and rely on a centralized coordinator, which can become a single point of failure. We present ReCraft, a novel reconfiguration protocol for Raft, which supports multi- and single-cluster-level reconfigurations. ReCraft does not rely on external coordinators and blocks minimally. ReCraft enables the sharding of Raft clusters with split and merge reconfigurations and adds a membership change scheme that improves Raft. We prove the safety and liveness of ReCraft and demonstrate its efficiency through implementations in etcd.