Two-Scale Latent Dynamics for Recurrent-Depth Transformers

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work investigates the geometric dynamics of hidden states in recurrent deep Transformers during iterative test-time generation, revealing a two-scale evolution: fine-grained intra-block optimization (with diminishing step sizes and increasingly orthogonal update directions) and coarse-grained inter-block drift. To exploit this structure, we propose a novel early-exit mechanism based on second-order gradient variation—replacing conventional KL-divergence criteria—to enhance inference stability and efficiency. Our method integrates hidden-state trajectory analysis, differential-geometric measurements (e.g., curvature and orthogonality), and adaptive step-size monitoring, coupled with a convergence criterion tailored for multi-step autoregressive inference. Evaluated across multiple model checkpoints, it achieves an average 23.6% reduction in inference latency, a 31.4% decrease in output variance, and maintains or improves generation quality. The approach establishes a new, interpretable, and robust paradigm for efficient autoregressive modeling.

Technology Category

Application Category

📝 Abstract

Recurrent-depth transformers scale test-time compute by iterating latent computations before emitting tokens. We study the geometry of these iterates and argue for a simple, emph{two-scale} operational picture: (i) within a looped block, updates act as emph{small-scale refinements}; (ii) across consecutive blocks, states undergo a emph{larger-scale drift}. Across checkpoints, our measurements show that loop steps become emph{smaller} and increasingly emph{orthogonal} to one another, indicating better local modeling of fine structure rather than merely pushing in a single direction. These dynamics motivate an early-exit mechanism based on the model's second-order difference in step-size, which we show is superior in terms of performance, stability and time-efficiency, when compared to the KL-divergence exit strategy of Geiping et al. and its naive first-order counterpart.

Problem

Research questions and friction points this paper is trying to address.

Studying geometric dynamics of recurrent-depth transformer iterations

Analyzing two-scale latent updates in looped transformer blocks

Developing early-exit mechanisms using second-order step-size differences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-scale latent dynamics for recurrent-depth transformers

Early-exit mechanism using second-order step-size difference

Small-scale refinements and large-scale drift in iterations

🔎 Similar Papers

No similar papers found.