The Consensus Trap: Rescuing Multi-Agent LLMs from Adversarial Majorities via Token-Level Collaboration

📅 2026-04-18

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

In open environments, large language model-based multi-agent systems are vulnerable to context contamination such as prompt injection attacks, and conventional response-level aggregation methods—like majority voting—fail when compromised agents constitute a local majority. This work proposes Token-Level Round-Robin collaboration, the first mechanism to elevate multi-agent coordination from the response level to the token level: multiple agents generate outputs token-by-token in a round-robin fashion within a shared autoregressive context, transforming static voting into dynamic logical interweaving. Theoretical analysis demonstrates that this approach, formulated as a product of nonlinear operators, enables honest agents to exert sufficient logical influence to overcome adversarial majorities. Empirical results across diverse reasoning benchmarks confirm that the method maintains high accuracy even when malicious agents dominate, significantly outperforming traditional aggregation strategies.

Technology Category

Application Category

📝 Abstract

Multi-agent large language model (LLM) architectures increasingly rely on response-level aggregation, such as Majority Voting (MAJ), to raise reasoning ceilings. However, in open environments, agents are highly susceptible to stealthy contextual corruption, such as targeted prompt injections. We reveal a critical structural vulnerability in current multi-agent systems: response-level aggregation collapses when corrupted agents form a local majority. Because voting aggregates fully-formed conclusions, it is blind to flawed intermediate logic. To overcome this systematic limitation, we propose the Token-Level Round-Robin (RR) Collaboration, where agents sequentially interleave generation within a shared auto-regressive context. We formalize this process as a discrete-time dynamical system, proving that token-level interleaving transitions aggregation from a brittle counting of final votes (a linear sum) to a dynamic, interwoven chain of logic (a non-linear operator product). Through this theoretical lens, we prove that the honest model's restorative pull can overpower adversarial corruptions, even when corrupted agents form a majority. We conduct an exhaustive empirical evaluation across diverse reasoning benchmarks and demonstrate that while MAJ collapses when corrupted agents reach a majority, RR maintains robust accuracy well beyond this critical threshold.

Problem

Research questions and friction points this paper is trying to address.

multi-agent LLMs

adversarial majority

response-level aggregation

contextual corruption

majority voting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-Level Collaboration

Multi-Agent LLMs

Adversarial Robustness