🤖 AI Summary
Large language models (LLMs) suffer from hallucination, insufficient creativity, and weak logical rigor when tackling complex, heterogeneous scientific tasks.
Method: This paper proposes a confidence-weighted multi-model deliberation framework inspired by scientific committee collaboration. It employs a multi-agent architecture integrating chain-of-thought reasoning and self-consistency enhancement to establish an interpretable consensus mechanism—enabling black-box collaborative inference without access to model internals.
Contribution/Results: The core innovation lies in dynamically weighting model outputs by their predicted confidence scores to guide weighted voting and iterative deliberation. This significantly improves scientific reasoning accuracy, narrative coherence, and creative output while reducing hallucination rates. Extensive experiments demonstrate strong generalization and robustness across diverse scientific domains, including physics, chemistry, biology, and interdisciplinary reasoning tasks.
📝 Abstract
Large language models (LLMs) have demonstrated remarkable capabilities not only in language generation but also in advancing scientific discovery. A growing body of work has explored ways to improve their reasoning, from self-consistency and chain-of-thought to multi-agent debate. Inspired by the dynamics of scientific committees and the "Society of Mind," we introduce Roundtable Policy, a complementary inference-time reasoning framework that performs inference through the weighted consensus of multiple LLMs. Our findings indicate that this approach significantly enhances reasoning in complex heterogeneous scientific tasks and improves scientific narratives in terms of creativity, rigor, and logical coherence, while reducing hallucinations that single models are prone to. Our approach emphasizes structured and interpretable consensus rather than opaque convergence, while requiring only black-box access and uniform procedures, making it broadly applicable to multi-LLM reasoning.