🤖 AI Summary
Existing self-consistency (SC) and its variants (ASC/ESC) employ fixed sampling budgets irrespective of question difficulty, leading to resource waste on simple questions and insufficient sampling for complex ones.
Method: We propose Difficulty-Adaptive Self-Consistency (DA-SC), the first framework to dynamically allocate sampling resources based on problem difficulty. DA-SC introduces a lightweight prior difficulty predictor and integrates it with posterior consistency distribution to enable fine-grained, adaptive sampling scheduling. Crucially, it operates as a plug-in module compatible with mainstream LLM inference frameworks—requiring no architectural modifications.
Results: Evaluated across six reasoning benchmarks, DA-SC reduces average sampling cost by up to 42% relative to ASC/ESC while preserving accuracy. This significantly improves the cost-effectiveness Pareto frontier, establishing a scalable new paradigm for efficient chain-of-thought reasoning.
📝 Abstract
Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples, reducing the cost of SC with minimal impact on performance. Both methods, however, do not exploit the prior information about question difficulty. It often results in unnecessary repeated sampling for easy questions that could be accurately answered with just one attempt, wasting resources. To tackle this problem, we propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information from both prior and posterior perspectives to adaptively allocate inference resources, further reducing the cost of SC. To demonstrate the effectiveness of DSC, we conduct extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning on six benchmarks. The empirical results show that DSC consistently surpasses the strong baseline ASC and ESC in terms of costs by a significant margin, while attaining comparable performances.