Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing large language model (LLM) approaches struggle to detect deep logical flaws in consensus protocols arising from multi-stage, complex state dependencies, often leading to violations of safety properties. This work proposes Agora, a domain-aware multi-agent verification framework that introduces, for the first time, a role-specialized multi-agent collaboration mechanism. By integrating hypothesis-driven testing, domain-constrained state-space exploration, and iterative validation, Agora enables systematic reasoning about global protocol invariants, overcoming the limitations of conventional single-function code analysis. Evaluated on four prominent consensus protocols—Raft, EPaxos, HotStuff, and BullShark—Agora successfully uncovers 15 previously unknown safety violations, all of which eluded detection by current LLM-based agents.

📝 Abstract

Consensus protocols form the backbone of distributed systems and blockchains, where implementation bugs can cause data corruption and financial losses. While LLM-based approaches show promise in code analysis, they struggle with deep protocol-level logic bugs involving complex state-dependent behaviors across multiple execution stages. We present Agora, a domain-aware multi-agent framework that integrates hypothesis-driven testing with LLM capabilities for systematic protocol verification. Agora employs specialized agents that collaboratively explore protocol state spaces, synthesize attack scenarios using domain-specific constraints, and validate findings through iterative refinement. This explicit role separation enables reasoning about global protocol invariants beyond single-function code analysis. We evaluate Agora on four consensus implementations (Raft, EPaxos, HotStuff, BullShark) using four state-of-the-art LLMs. Agora discovers 15 previously unknown protocol-level logic bugs that violate safety properties, while existing LLM-based agents fail to detect any such protocol-level logic bugs. Our results demonstrate that domain-aware multi-agent collaboration is essential for detecting deep logic bugs in complex protocols.

Problem

Research questions and friction points this paper is trying to address.

consensus protocols

protocol-level logic bugs

state-dependent behaviors

safety properties

distributed systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent LLM framework

protocol-level bug detection

hypothesis-driven testing