Roundtable Policy: Improving Scientific Reasoning and Narratives through Confidence-Weighted Consensus of LLMs

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Large language models (LLMs) suffer from hallucination, insufficient creativity, and weak logical rigor when tackling complex, heterogeneous scientific tasks. Method: This paper proposes a confidence-weighted multi-model deliberation framework inspired by scientific committee collaboration. It employs a multi-agent architecture integrating chain-of-thought reasoning and self-consistency enhancement to establish an interpretable consensus mechanism—enabling black-box collaborative inference without access to model internals. Contribution/Results: The core innovation lies in dynamically weighting model outputs by their predicted confidence scores to guide weighted voting and iterative deliberation. This significantly improves scientific reasoning accuracy, narrative coherence, and creative output while reducing hallucination rates. Extensive experiments demonstrate strong generalization and robustness across diverse scientific domains, including physics, chemistry, biology, and interdisciplinary reasoning tasks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated remarkable capabilities not only in language generation but also in advancing scientific discovery. A growing body of work has explored ways to improve their reasoning, from self-consistency and chain-of-thought to multi-agent debate. Inspired by the dynamics of scientific committees and the "Society of Mind," we introduce Roundtable Policy, a complementary inference-time reasoning framework that performs inference through the weighted consensus of multiple LLMs. Our findings indicate that this approach significantly enhances reasoning in complex heterogeneous scientific tasks and improves scientific narratives in terms of creativity, rigor, and logical coherence, while reducing hallucinations that single models are prone to. Our approach emphasizes structured and interpretable consensus rather than opaque convergence, while requiring only black-box access and uniform procedures, making it broadly applicable to multi-LLM reasoning.

Problem

Research questions and friction points this paper is trying to address.

Enhancing scientific reasoning in complex heterogeneous tasks

Improving creativity, rigor and logical coherence of narratives

Reducing hallucinations prone to single LLM models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weighted consensus of multiple LLMs

Structured and interpretable consensus framework

Black-box access with uniform procedures

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting

2024-10-10Citations: 0

DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations

2024-07-08arXiv.orgCitations: 3

Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks

2024-06-12Citations: 0