🤖 AI Summary
This study investigates the selection and evaluation of decision-making protocols in multi-agent debate. Using controlled experiments within a unified multi-agent debate framework, we systematically assess seven classical decision protocols—including majority voting and consensus—on knowledge-intensive tasks (MMLU, GPQA) and reasoning tasks (StrategyQA, MuSR). We propose two novel mechanisms: All-Agents Drafting (AAD), which enhances answer diversity, and Collective Improvement (CI), which mitigates limitations inherent to single-protocol reliance. Empirical results show that voting-based protocols improve performance by 13.2% on reasoning tasks, while consensus-based protocols yield a 2.8% gain on knowledge tasks. AAD and CI achieve up to 3.3% and 7.4% absolute accuracy improvements, respectively. Our findings establish decision mechanisms as a critical determinant of multi-agent collaborative efficacy, providing both empirical grounding and a new methodology for adaptive protocol selection.
📝 Abstract
Much of the success of multi-agent debates depends on carefully choosing the right parameters. Among them, the decision-making protocol stands out. Systematic comparison of decision protocols is difficult because studies alter multiple discussion parameters beyond the protocol. So far, it has been largely unknown how decision-making addresses the challenges of different tasks. This work systematically evaluates the impact of seven decision protocols (e.g., majority voting, unanimity consensus). We change only one variable at a time (i.e., decision protocol) to analyze how different methods affect the collaboration between agents and test different protocols on knowledge (MMLU, MMLU-Pro, GPQA) and reasoning datasets (StrategyQA, MuSR, SQuAD 2.0). Our results show that voting protocols improve performance by 13.2% in reasoning tasks and consensus protocols by 2.8% in knowledge tasks over the other decision protocol. Increasing the number of agents improves performance, while more discussion rounds before voting reduces it. To improve decision-making by increasing answer diversity, we propose two new methods, All-Agents Drafting (AAD) and Collective Improvement (CI). Our methods improve task performance by up to 3.3% with AAD and up to 7.4% with CI. This work demonstrates the importance of decision-making in multi-agent debates beyond scaling.