🤖 AI Summary
This work addresses the challenge of concurrent Byzantine faults and recurrent transient faults—both malicious and benign—in distributed systems. We propose the first self-stabilizing state machine replication (SMR) protocol, built upon a novel distributed consensus algorithm that integrates self-stabilization, threshold fault tolerance, and dynamic state verification to achieve rapid convergence to consistency from arbitrary initial states. Our solution is the first to simultaneously guarantee four strong properties: (i) Byzantine fault tolerance against up to ⌊n/3⌋−1 malicious nodes; (ii) transient fault tolerance against up to ⌊n/6⌋−1 malicious transient faults—or more benign transient faults; (iii) input interval accuracy; and (iv) self-stabilizing recovery without system restart. Crucially, it supports a hybrid fault model, ensuring long-term consistency and numerical reliability. This establishes a new paradigm for trustworthy SMR in highly dynamic, safety-critical environments.
📝 Abstract
The ability to perform repeated Byzantine agreement lies at the heart of important applications such as blockchain price oracles or replicated state machines. Any such protocol requires the following properties: (1) extit{Byzantine fault-tolerance}, because not all participants can be assumed to be honest, (2) r extit{ecurrent transient fault-tolerance}, because even honest participants may be subject to transient ``glitches'', (3) extit{accuracy}, because the results of quantitative queries (such as price quotes) must lie within the interval of honest participants' inputs, and (4) extit{self-stabilization}, because it is infeasible to reboot a distributed system following a fault. This paper presents the first protocol for repeated Byzantine agreement that satisfies the properties listed above. Specifically, starting in an arbitrary system configuration, our protocol establishes consistency. It preserves consistency in the face of up to $lceil n/3
ceil -1$ Byzantine participants {em and} constant recurring (``noise'') transient faults, of up to $lceil n/6
ceil-1$ additional malicious transient faults, or even more than $lceil n/6
ceil-1$ (uniformly distributed) random transient faults, in each repeated Byzantine agreement.