Imperfect Influence, Preserved Rankings: A Theory of TRAK for Data Attribution

๐Ÿ“… 2026-02-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limited theoretical understanding of the TRAK algorithmโ€™s foundation in data attribution and how its approximation errors affect the accuracy of influence estimation. Through rigorous theoretical analysis, we characterize TRAKโ€™s performance in attributing model predictions to training data and quantify the errors introduced by its reliance on kernel machine approximations and approximate leave-one-out (ALO) risk estimation. We prove that, despite substantial approximation errors, TRAK consistently preserves the relative ranking of data point influences. Both theoretical and empirical results demonstrate that the method reliably maintains attribution rankings across diverse settings, thereby providing a solid theoretical justification for its practical effectiveness.

Technology Category

Application Category

๐Ÿ“ Abstract
Data attribution, tracing a model's prediction back to specific training data, is an important tool for interpreting sophisticated AI models. The widely used TRAK algorithm addresses this challenge by first approximating the underlying model with a kernel machine and then leveraging techniques developed for approximating the leave-one-out (ALO) risk. Despite its strong empirical performance, the theoretical conditions under which the TRAK approximations are accurate as well as the regimes in which they break down remain largely unexplored. In this paper, we provide a theoretical analysis of the TRAK algorithm, characterizing its performance and quantifying the errors introduced by the approximations on which the method relies. We show that although the approximations incur significant errors, TRAK's estimated influence remains highly correlated with the original influence and therefore largely preserves the relative ranking of data points. We corroborate our theoretical results through extensive simulations and empirical studies.
Problem

Research questions and friction points this paper is trying to address.

data attribution
TRAK
influence estimation
model interpretation
approximation error
Innovation

Methods, ideas, or system contributions that make the work stand out.

data attribution
TRAK
influence estimation
theoretical analysis
ranking preservation
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Han Tong
Department of Statistics, Columbia University, New York, NY, USA
S
Shubhangi Ghosh
Department of Statistics, Columbia University, New York, NY, USA
H
Haolin Zou
Department of Statistics, Columbia University, New York, NY, USA
Arian Maleki
Arian Maleki
Columbia University
High-dimensional statisticscomputational imagingmachine learningcompressed sensing