🤖 AI Summary
Large language models (LLMs) suffer from hallucination and inefficiency in self-consistency reasoning due to blind sampling and imprecise path selection. To address this, we propose RACO—the first reasoning-aware self-consistency framework—that jointly models the credibility of reasoning paths and answer consistency. RACO introduces a dynamic quality assessment mechanism that drives early stopping during multi-path generation and enables credibility-weighted majority voting. Unlike conventional fixed-sampling approaches, RACO continuously evaluates reasoning fidelity in real time, retains only high-fidelity paths, and aggregates answers proportionally to their estimated reliability. On diverse question-answering benchmarks, RACO reduces average sampling overhead by 70% while preserving accuracy, significantly improving the faithfulness and interpretability of selected rationales. It is the first method to systematically balance inference efficiency and logical fidelity under resource constraints.
📝 Abstract
Self-Consistency mitigates hallucinations in Large Language Models (LLMs) by sampling multiple reasoning paths,but it lacks a systematic approach to determine the optimal number of samples or select the most faithful rationale. To address this limitation, we introduce Reasoning-Aware Self-Consistency (RASC), a novel framework that enhances sampling efficiency and reasoning faithfulness by dynamically evaluating both outputs and rationales. RASC assesses the quality of reasoning and the consistency of answers for each generated sample, using these assessments to guide early stopping decisions and rationale selection. The framework employs criteria-based stopping and weighted majority voting, enabling more informed choices on when to halt sampling and which rationale to select. Our comprehensive experiments across diverse question-answering datasets demonstrate that RASC outperforms existing methods, reducing sample usage by approximately 70% while maintaining accuracy. Moreover, RASC facilitates the selection of high-fidelity rationales, thereby improving the faithfulness of LLM outputs. Our approach effectively addresses the efficiency-accuracy trade-off in LLM reasoning tasks, offering a new perspective for more nuanced, faithful, and effective utilization of LLMs in resource-constrained environments.