Step-resolved data attribution for looped transformers

📅 2026-02-10
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of characterizing the influence of individual training samples across the iterative steps of recurrent Transformers, a capability lacking in existing data influence estimation methods. To this end, the authors propose Step-Decomposed Influence (SDI), which unfolds the recurrent computation graph to decompose data influence at each inference step. By integrating the TracIn framework with TensorSketch approximation, SDI avoids explicit per-sample gradient computation, enabling efficient and scalable fine-grained attribution. Experiments demonstrate that SDI achieves high accuracy and strong scalability on recurrent GPT models and algorithmic reasoning tasks, facilitating multi-dimensional interpretability analyses of the internal reasoning dynamics within recurrent architectures.

Technology Category

Application Category

📝 Abstract
We study how individual training examples shape the internal computation of looped transformers, where a shared block is applied for $\tau$ recurrent iterations to enable latent reasoning. Existing training-data influence estimators such as TracIn yield a single scalar score that aggregates over all loop iterations, obscuring when during the recurrent computation a training example matters. We introduce \textit{Step-Decomposed Influence (SDI)}, which decomposes TracIn into a length-$\tau$ influence trajectory by unrolling the recurrent computation graph and attributing influence to specific loop iterations. To make SDI practical at transformer scale, we propose a TensorSketch implementation that never materialises per-example gradients. Experiments on looped GPT-style models and algorithmic reasoning tasks show that SDI scales excellently, matches full-gradient baselines with low error and supports a broad range of data attribution and interpretability tasks with per-step insights into the latent reasoning process.
Problem

Research questions and friction points this paper is trying to address.

data attribution
looped transformers
influence estimation
recurrent computation
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Step-Decomposed Influence
looped transformers
data attribution
TensorSketch
recurrent computation
🔎 Similar Papers
No similar papers found.