LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models often suffer from “over-reasoning”: continuing computation despite having already gathered sufficient information, leading to wasted computational resources and degraded accuracy. Existing early-exit methods rely on auxiliary sampling, external verifiers, or post-hoc analysis—lacking theoretical guarantees and generalizability. This paper proposes LYNX, an online early-exit mechanism that leverages the model’s own hidden states to identify exit points autonomously, using natural reasoning cues. It integrates a lightweight probe with split conformal prediction to enable zero-shot, distribution-agnostic, confidence-controllable exit across tasks and temperatures. On GSM8K, LYNX reduces token consumption by 40–65% without accuracy loss; on MATH-500, it improves accuracy by 12 percentage points while reducing tokens by 35–60%; on AIME 2024 and CommonsenseQA, it significantly lowers computational overhead while maintaining or improving performance.

Technology Category

Application Category

📝 Abstract
Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often "overthink": continuing to reason long after they have enough information to answer correctly. This wastes inference-time compute and can hurt accuracy. Existing attempts to stop early either manipulate decoding with extra sampling and heuristics, rely on auxiliary verifier models, or operate only as post-hoc analysis pipelines without formal guarantees. We introduce LYNX, an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions. LYNX attaches exit decisions to naturally occurring reasoning cues (e.g., "hmm", "wait") during generation, trains a lightweight probe on hidden states at those cue tokens using supervision from forced exits, and wraps the resulting scores in split conformal prediction to obtain distribution-free control over premature exits. Crucially, we train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks. Across three model families spanning 1.5B to 32B parameters, a single mathematically trained probe per base model yields strong accuracy--efficiency tradeoffs. On GSM8K, LYNX matches or improves baseline accuracy while reducing tokens by 40--65%; on MATH-500 it improves accuracy by up to 12 points with roughly 35--60% fewer tokens; on AIME 2024 it recovers baseline accuracy with more than 50% token savings; and on CommonsenseQA, a non-math benchmark, it transfers zero-shot with modest accuracy gains and up to 70% fewer tokens. Compared to state-of-the-art early-exit methods, LYNX offers competitive or superior Pareto frontiers while remaining fully online, requiring no proxy models at inference, and providing explicit, user-tunable confidence guarantees.
Problem

Research questions and friction points this paper is trying to address.

Reducing overthinking in large reasoning models to save compute.
Providing online early-exit decisions with formal confidence guarantees.
Enabling efficient reasoning across diverse tasks without task-specific retraining.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses hidden-state probes at reasoning cues for early exits
Applies conformal prediction for confidence-controlled stopping decisions
Trains once on math data, transfers zero-shot across tasks