The Path of Least Resistance: Guiding LLM Reasoning Trajectories with Prefix Consensus

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the high computational cost of Self-Consistency (SC) in large language model inference, which stems from enumerating all possible reasoning paths. To mitigate this, we propose PoLR, a plug-and-play pre-filtering method that requires no fine-tuning. PoLR leverages short-prefix clustering to identify the dominant path cluster and selectively expands only those paths. The approach is grounded in a theoretical analysis integrating prefix consistency, mutual information, and entropy, and is further enhanced with adaptive inference mechanisms such as Early-Stopping SC. Evaluated on benchmarks including GSM8K and MATH500, PoLR matches or even surpasses the accuracy of standard Self-Consistency while reducing token consumption by up to 60% and latency by up to 50%.

Technology Category

Application Category

📝 Abstract

Large language models achieve strong reasoning performance, but inference strategies such as Self-Consistency (SC) are computationally expensive, as they fully expand all reasoning traces. We introduce PoLR (Path of Least Resistance), the first inference-time method to leverage prefix consistency for compute-efficient reasoning. PoLR clusters short prefixes of reasoning traces, identifies the dominant cluster, and expands all paths in that cluster, preserving the accuracy benefits of SC while substantially reducing token usage and latency. Our theoretical analysis, framed via mutual information and entropy, explains why early reasoning steps encode strong signals predictive of final correctness. Empirically, PoLR consistently matches or exceeds SC across GSM8K, MATH500, AIME24/25, and GPQA-DIAMOND, reducing token usage by up to 60% and wall-clock latency by up to 50%. Moreover, PoLR is fully complementary to adaptive inference methods (e.g., Adaptive Consistency, Early-Stopping SC) and can serve as a drop-in pre-filter, making SC substantially more efficient and scalable without requiring model fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Reasoning Efficiency

Self-Consistency

Computational Cost

Inference Latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prefix Consistency

Compute-Efficient Reasoning

Self-Consistency