π€ AI Summary
This paper addresses the lack of modular communication modeling and insufficient reliability analysis in distributed consensus networks. We propose the first generic communication abstraction framework for crash- and Byzantine-fault-tolerant protocols, unifying core communication components across RAFT, Paxos, PBFT, and HotStuff. Innovatively, we develop a probabilistic graphical model-based method to quantify consensus reliability, enabling joint evaluation of failure probability and latency under link loss and node failures. We further design two protocol-level latency optimization mechanisms and implement a failure-rate-controllable communication layer. Theoretical analysis is empirically validated on a RAFT prototype: end-to-end latency is significantly reduced. Our framework provides a reusable modeling and design foundation for low-failure, low-latency consensus systems.
π Abstract
In this paper, we propose a modularized framework for communication processes applicable to crash and Byzantine fault-tolerant consensus protocols. We abstract basic communication components and show that the communication process of the classic consensus protocols such as RAFT, single-decree Paxos, PBFT, and Hotstuff, can be represented by the combination of communication components. Based on the proposed framework, we develop an approach to analyze the consensus reliability of different protocols, where link loss and node failure are measured as a probability. We propose two latency optimization methods and implement a RAFT system to verify our theoretical analysis and the effectiveness of the proposed latency optimization methods. We also discuss decreasing consensus failure rate by adjusting protocol designs. This paper provides theoretical guidance for the design of future consensus systems with a low consensus failure rate and latency under the possible communication loss.