As Language Models Scale, Low-order Linear Depth Dynamics Emerge

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models function as high-dimensional nonlinear black boxes, lacking interpretable internal dynamics. This work proposes using low-order linear dynamical systems—such as 32-dimensional proxies—to accurately approximate the influence of individual Transformer layers on model outputs. Through inter-layer sensitivity analysis and additive intervention experiments, the study reveals that, as model scale increases, the behavioral alignment between a fixed-order linear proxy and the true model consistently improves, enabling efficient multi-layer interventions. On tasks involving toxicity, sarcasm, hate speech, and sentiment, the proxy nearly perfectly replicates the layer-wise sensitivity profiles of GPT-2-large, offering a scalable and interpretable paradigm for understanding the internal mechanisms of large language models.

Technology Category

Application Category

📝 Abstract
Large language models are often viewed as high-dimensional nonlinear systems and treated as black boxes. Here, we show that transformer depth dynamics admit accurate low-order linear surrogates within context. Across tasks including toxicity, irony, hate speech and sentiment, a 32-dimensional linear surrogate reproduces the layerwise sensitivity profile of GPT-2-large with near-perfect agreement, capturing how the final output shifts under additive injections at each layer. We then uncover a surprising scaling principle: for a fixed-order linear surrogate, agreement with the full model improves monotonically with model size across the GPT-2 family. This linear surrogate also enables principled multi-layer interventions that require less energy than standard heuristic schedules when applied to the full model. Together, our results reveal that as language models scale, low-order linear depth dynamics emerge within contexts, offering a systems-theoretic foundation for analyzing and controlling them.
Problem

Research questions and friction points this paper is trying to address.

language models
depth dynamics
linear surrogates
scaling
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

linear surrogate
depth dynamics
scaling law
transformer interpretability
model intervention
🔎 Similar Papers
No similar papers found.
B
Buddhika Nettasinghe
University of Iowa, Iowa City, USA
Geethu Joseph
Geethu Joseph
Delft University of Technology
Statistical signal processing