Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Majority voting in multi-LLM collaborative reasoning overlooks model heterogeneity and inter-model response correlations, leading to suboptimal ensemble decisions. Method: This paper proposes a novel aggregation paradigm integrating first-order information (individual model confidence) and second-order information (cross-model response consistency). We design two theoretically grounded algorithms: Optimal Weight—derived from statistically optimal weighting—and Inverse Surprising Popularity—which leverages counterintuitive consensus signals. Contribution/Results: We formally prove the statistical superiority of both algorithms and reveal the critical role of higher-order correlations in enhancing collective decision-making. Extensive evaluation across synthetic data, UltraFeedback, MMLU, and the real-world healthcare benchmark ARMMAN demonstrates that our approach consistently outperforms majority voting, improving accuracy by 3.2–9.7 percentage points on average. Moreover, it significantly boosts the robustness and reliability of multi-agent systems across diverse domains.

Technology Category

Application Category

📝 Abstract

With the rapid progress of multi-agent large language model (LLM) reasoning, how to effectively aggregate answers from multiple LLMs has emerged as a fundamental challenge. Standard majority voting treats all answers equally, failing to consider latent heterogeneity and correlation across models. In this work, we design two new aggregation algorithms called Optimal Weight (OW) and Inverse Surprising Popularity (ISP), leveraging both first-order and second-order information. Our theoretical analysis shows these methods provably mitigate inherent limitations of majority voting under mild assumptions, leading to more reliable collective decisions. We empirically validate our algorithms on synthetic datasets, popular LLM fine-tuning benchmarks such as UltraFeedback and MMLU, and a real-world healthcare setting ARMMAN. Across all cases, our methods consistently outperform majority voting, offering both practical performance gains and conceptual insights for the design of robust multi-agent LLM pipelines.

Problem

Research questions and friction points this paper is trying to address.

Aggregating multiple LLM answers beyond majority voting

Leveraging higher-order information to address model heterogeneity

Designing robust algorithms for reliable multi-agent LLM decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging first and second-order information for aggregation

Introducing Optimal Weight and Inverse Surprising Popularity algorithms

Mitigating majority voting limitations with theoretical guarantees

🔎 Similar Papers

Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning