Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Although self-consistency reasoning enhances the reliability of large language models, its reliance on multiple sampling incurs substantial computational overhead. Existing adaptive approaches often depend on simple vote counting, neglecting the confidence of individual responses and thereby generating redundant samples. This work proposes ReASC, a novel method that introduces response-level confidence to guide sampling decisions through a two-stage mechanism: rapid single-sample judgment followed by confidence-frequency jointly weighted aggregation, thereby overcoming the limitations of conventional majority voting. Evaluated across five models and four datasets, ReASC consistently achieves the best trade-off between accuracy and inference cost, reducing computational expense by up to 70% on Gemma-2-4B-it while maintaining competitive accuracy.

Technology Category

Application Category

📝 Abstract
Self-Consistency improves reasoning reliability through multi-sample aggregation, but incurs substantial inference cost. Adaptive self-consistency methods mitigate this issue by adjusting the sampling budget; however, they rely on count-based stopping rules that treat all responses equally, often leading to unnecessary sampling. We propose Reliability-Aware Adaptive Self-Consistency (ReASC), which addresses this limitation by reframing adaptive sampling from response counting to evidence sufficiency, leveraging response-level confidence for principled information aggregation. ReASC operates in two stages: a single-sample decision stage that resolves instances confidently answerable from a single response, and a reliability-aware accumulation stage that aggregates responses by jointly leveraging their frequency and confidence. Across five models and four datasets, ReASC consistently achieves the best accuracy-cost trade-off compared to existing baselines, yielding improved inference efficiency across model scales from 3B to 27B parameters. As a concrete example, ReASC reduces inference cost by up to 70\% relative to self-consistency while preserving accuracy on GSM8K using Gemma-3-4B-it.
Problem

Research questions and friction points this paper is trying to address.

Self-Consistency
Adaptive Sampling
Inference Cost
Reliability
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive self-consistency
reliability-aware sampling
confidence-based aggregation
efficient LLM inference
evidence sufficiency
🔎 Similar Papers
No similar papers found.