Internalizing LLM Reasoning via Discovery and Replay of Latent Actions

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the limitations of existing activation steering methods, which rely on static control vectors and struggle to adapt to the non-stationary evolution of reasoning processes in complex tasks. To overcome this, we propose STIR—a framework that formulates reasoning enhancement as a dynamic latent trajectory control problem. STIR achieves context-adaptive implicit reasoning through a three-stage pipeline: differential intrinsic action induction, construction of a sparse geometrically diverse control basis, and anchor-gated value-modulated trajectory intervention. Our approach is the first to internalize chain-of-thought reasoning as a dynamically controllable sequence of latent actions, thereby transcending the constraints of static guidance. Evaluated across six arithmetic and logical reasoning benchmarks, STIR improves average accuracy by 1.9%–7.5% while reducing token consumption by up to 35%.

Technology Category

Application Category

📝 Abstract

The internalization of chain-of-thought processes into hidden states has emerged as a highly efficient paradigm for scaling test-time compute. However, existing activation steering methods rely on static control vectors that fail to adapt to the non-stationary evolution of complex reasoning tasks. To address this limitation, we propose STIR (Self-Distilled Tools for Internal Reasoning), a framework that reformulates reasoning enhancement as a dynamic latent trajectory control problem. STIR introduces a synergistic three-stage pipeline: (1) differential intrinsic action induction harvests latent reasoning successes to crystallize steering primitives; (2) sparse control basis construction curates a compact, geometrically diverse tool library; and (3) value-modulated trajectory intervention dynamically injects context-specific impulses via anchor-based gating. Extensive experiments on six arithmetic and logical benchmarks across four representative models demonstrate that STIR improves average accuracy by 1.9% to 7.5% while reducing average token consumption by up to 35% compared to vanilla decoding. These findings demonstrate that the benefits of explicit chain-of-thought can be realized through dynamic latent trajectory control, internalizing the reasoning process to bypass the explicit generation while achieving superior fidelity. Our code is available at https://github.com/sznnzs/LLM-Latent-Action.

Problem

Research questions and friction points this paper is trying to address.

chain-of-thought

latent actions

dynamic control

reasoning internalization

non-stationary reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent action

dynamic trajectory control

internalized reasoning