🤖 AI Summary
This work addresses the high computational cost of self-consistency reasoning in large language models, which typically requires extensive sampling. The authors propose a novel hybrid ensemble framework that integrates Chain-of-Thought (CoT) and Program-of-Thought (PoT) reasoning, combining full sampling with an early-stopping strategy. This approach significantly reduces the number of required samples while maintaining or even improving accuracy. Notably, it achieves effective self-consistency inference on most tasks with only two samples—reducing average sampling by 9.3×—and attains optimal performance on 78.6% of tasks under this minimal sampling regime. The method thus breaks the conventional reliance on high-volume sampling for robust self-consistency reasoning.
📝 Abstract
Self-consistency (SC) is a popular technique for improving the reasoning accuracy of large language models by aggregating multiple sampled outputs, but it comes at a high computational cost due to extensive sampling. We introduce a hybrid ensembling approach that leverages the complementary strengths of two distinct modes of reasoning: Chain-of-Thought (CoT) and Program-of-Thought (PoT). We describe a general framework for combining these two forms of reasoning in self-consistency, as well as particular strategies for both full sampling and early-stopping. We show that CoT-PoT ensembling not only improves overall accuracy, but also drastically reduces the number of samples required for SC by a factor of 9.3x. In particular, the majority of tasks (78.6%) can be addressed with only two samples, which has not been possible with any prior SC methods.