Reliability-Aware Adaptive Self-Consistency for Efficient Sampling in LLM Reasoning

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Although self-consistency reasoning enhances the reliability of large language models, its reliance on multiple sampling incurs substantial computational overhead. Existing adaptive approaches often depend on simple vote counting, neglecting the confidence of individual responses and thereby generating redundant samples. This work proposes ReASC, a novel method that introduces response-level confidence to guide sampling decisions through a two-stage mechanism: rapid single-sample judgment followed by confidence-frequency jointly weighted aggregation, thereby overcoming the limitations of conventional majority voting. Evaluated across five models and four datasets, ReASC consistently achieves the best trade-off between accuracy and inference cost, reducing computational expense by up to 70% on Gemma-2-4B-it while maintaining competitive accuracy.

Technology Category

Application Category

📝 Abstract

Self-Consistency improves reasoning reliability through multi-sample aggregation, but incurs substantial inference cost. Adaptive self-consistency methods mitigate this issue by adjusting the sampling budget; however, they rely on count-based stopping rules that treat all responses equally, often leading to unnecessary sampling. We propose Reliability-Aware Adaptive Self-Consistency (ReASC), which addresses this limitation by reframing adaptive sampling from response counting to evidence sufficiency, leveraging response-level confidence for principled information aggregation. ReASC operates in two stages: a single-sample decision stage that resolves instances confidently answerable from a single response, and a reliability-aware accumulation stage that aggregates responses by jointly leveraging their frequency and confidence. Across five models and four datasets, ReASC consistently achieves the best accuracy-cost trade-off compared to existing baselines, yielding improved inference efficiency across model scales from 3B to 27B parameters. As a concrete example, ReASC reduces inference cost by up to 70\% relative to self-consistency while preserving accuracy on GSM8K using Gemma-3-4B-it.

Problem

Research questions and friction points this paper is trying to address.

Self-Consistency

Adaptive Sampling

Inference Cost

Reliability

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive self-consistency

reliability-aware sampling

confidence-based aggregation