Roundtable Policy: Improving Scientific Reasoning and Narratives through Confidence-Weighted Consensus of LLMs

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from hallucination, insufficient creativity, and weak logical rigor when tackling complex, heterogeneous scientific tasks. Method: This paper proposes a confidence-weighted multi-model deliberation framework inspired by scientific committee collaboration. It employs a multi-agent architecture integrating chain-of-thought reasoning and self-consistency enhancement to establish an interpretable consensus mechanism—enabling black-box collaborative inference without access to model internals. Contribution/Results: The core innovation lies in dynamically weighting model outputs by their predicted confidence scores to guide weighted voting and iterative deliberation. This significantly improves scientific reasoning accuracy, narrative coherence, and creative output while reducing hallucination rates. Extensive experiments demonstrate strong generalization and robustness across diverse scientific domains, including physics, chemistry, biology, and interdisciplinary reasoning tasks.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated remarkable capabilities not only in language generation but also in advancing scientific discovery. A growing body of work has explored ways to improve their reasoning, from self-consistency and chain-of-thought to multi-agent debate. Inspired by the dynamics of scientific committees and the "Society of Mind," we introduce Roundtable Policy, a complementary inference-time reasoning framework that performs inference through the weighted consensus of multiple LLMs. Our findings indicate that this approach significantly enhances reasoning in complex heterogeneous scientific tasks and improves scientific narratives in terms of creativity, rigor, and logical coherence, while reducing hallucinations that single models are prone to. Our approach emphasizes structured and interpretable consensus rather than opaque convergence, while requiring only black-box access and uniform procedures, making it broadly applicable to multi-LLM reasoning.
Problem

Research questions and friction points this paper is trying to address.

Enhancing scientific reasoning in complex heterogeneous tasks
Improving creativity, rigor and logical coherence of narratives
Reducing hallucinations prone to single LLM models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weighted consensus of multiple LLMs
Structured and interpretable consensus framework
Black-box access with uniform procedures
🔎 Similar Papers
No similar papers found.