🤖 AI Summary
This work addresses the vulnerability of shared memory in multi-agent debate systems to contamination and the inadequacy of existing methods in reliably identifying erroneous entries. The authors propose a zero-trust memory game framework that, for the first time, leverages game-theoretic equilibrium as an objective metric of memory credibility, thereby eliminating reliance on large language model judgments. They further introduce an algorithmic calibration mechanism based on agent-generated retrieval queries and traversal paths, which is compatible with both embedding-based and graph-structured memory architectures. Experimental results demonstrate that the proposed approach consistently outperforms current methods across diverse benchmarks, multi-agent frameworks, and memory architectures, exhibiting strong robustness against adversarial agents while incurring minimal inference overhead.
📝 Abstract
Multi-agent debate (MAD) systems increasingly rely on shared memory to support long-horizon reasoning, but this convenience opens a critical vulnerability: a single corrupted entry can contaminate the downstream memory-augmented reasoning, and debate alone fails to filter such errors. Existing safeguards filter entries via heuristics or LLM-based validation, yet they rely on AI judgments that share the same failure modes and overlook the cross-agent dynamics of MAD. We address this gap by formulating memory updating in MAD as a zero-trust memory game, in which no agent is assumed honest and the game's equilibrium serves as an indicator of optimal memory trust. Guided by this equilibrium, we propose EquiMem, an inference-time calibration mechanism that quantifies each update algorithmically against the shared memory state, using agents' existing retrieval queries and traversal paths as evidence rather than soliciting any LLM judgment. EquiMem instantiates calibration for both embedding- and graph-based memory, and across diverse benchmarks, MAD frameworks, and memory architectures, it consistently outperforms existing safeguards, remains robust under adversarial agents, and incurs negligible inference overhead.