๐ค AI Summary
This work addresses the limited theoretical understanding of the TRAK algorithmโs foundation in data attribution and how its approximation errors affect the accuracy of influence estimation. Through rigorous theoretical analysis, we characterize TRAKโs performance in attributing model predictions to training data and quantify the errors introduced by its reliance on kernel machine approximations and approximate leave-one-out (ALO) risk estimation. We prove that, despite substantial approximation errors, TRAK consistently preserves the relative ranking of data point influences. Both theoretical and empirical results demonstrate that the method reliably maintains attribution rankings across diverse settings, thereby providing a solid theoretical justification for its practical effectiveness.
๐ Abstract
Data attribution, tracing a model's prediction back to specific training data, is an important tool for interpreting sophisticated AI models. The widely used TRAK algorithm addresses this challenge by first approximating the underlying model with a kernel machine and then leveraging techniques developed for approximating the leave-one-out (ALO) risk. Despite its strong empirical performance, the theoretical conditions under which the TRAK approximations are accurate as well as the regimes in which they break down remain largely unexplored. In this paper, we provide a theoretical analysis of the TRAK algorithm, characterizing its performance and quantifying the errors introduced by the approximations on which the method relies. We show that although the approximations incur significant errors, TRAK's estimated influence remains highly correlated with the original influence and therefore largely preserves the relative ranking of data points. We corroborate our theoretical results through extensive simulations and empirical studies.