🤖 AI Summary
Existing multi-agent consensus methods rely on simplistic voting schemes, neglecting internal belief inconsistencies, and enforce uniform global collaboration—compromising consensus stability. This work proposes a dual-mechanism framework: (1) endogenous belief calibration to mitigate intra-system belief conflicts, and (2) belief-compatibility-driven dynamic collaboration partner selection, enabling agent-specific optimal partner matching. The approach integrates multi-agent systems theory, interpretable consensus mechanisms, a novel belief calibration model, and a principled collaboration selection algorithm. Evaluated on MATH and MMLU benchmarks, it achieves absolute accuracy improvements of 2.23% and 3.95%, respectively, substantially outperforming state-of-the-art methods. Key contributions include the first formalization of system-internal belief calibration for consensus, a theoretically grounded dynamic collaboration selection framework based on belief compatibility, and empirical validation of enhanced robustness and accuracy in multi-agent reasoning.
📝 Abstract
A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. Moreover, these methods often involve agents updating their results through indiscriminate collaboration with every other agent. Such uniform interaction fails to identify the optimal collaborators for each agent, hindering the emergence of a stable consensus. To address these challenges, we provide a theoretical framework for selecting optimal collaborators that maximize consensus stability. Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results by 2.23% and 3.95% of accuracy on challenging tasks, respectively. Our code and data are available at https://github.com/dengwentao99/BCCS.