🤖 AI Summary
This work proposes a precedent-guided reasoning framework to address the inefficiency and performance degradation in large reasoning models caused by verbose and repetitive chain-of-thought processes. Inspired by human problem-solving through precedents, the method adaptively selects semantically relevant and low-perplexity precedents during inference and dynamically internalizes their solution patterns via lightweight adapters. This shifts reasoning from exhaustive self-exploration to efficient guided generation. Notably, it is the first approach to enable dynamic construction and utilization of a precedent set at test time. Experiments across mathematical reasoning, scientific question answering, and code generation demonstrate that the framework significantly shortens reasoning trajectories while maintaining or even improving accuracy, achieving an excellent trade-off between precision and efficiency.
📝 Abstract
Reasoning in Large Language Models (LLMs) often suffers from inefficient long chain-of-thought traces with redundant self-exploration and validation, which inflate computational costs and even degrade performance. Inspired by human reasoning patterns where people solve new problems by leveraging past related cases to constrain search spaces and reduce trial-and-error, we propose Precedent Informed Reasoning (PIR) transforming LRMs'reasoning paradigm from exhaustive self-exploration to guided learning from precedents. PIR addresses two key challenges: what precedents to adopt and how to utilize them. First, Adaptive Precedent Selection (APS) constructs, for each question and LRM, a compact set of precedents that are both semantically related and informative for the model. It ranks examples by a joint score with semantic similarity and model perplexity, then adapts the amount of precedents to maximize perplexity reduction. Second, Test-time Experience Internalization (TEI) is treated as the test-time learning on precedent-informed instruction, updating lightweight adapters to internalize solution patterns and use them as a prior during subsequent reasoning. Experiments across mathematical reasoning, scientific QA, and code generation demonstrate that PIR consistently shortens reasoning traces while maintaining or improving final accuracy across LLMs, yielding outstanding accuracy-efficiency trade-offs.