π€ AI Summary
Existing training data attribution (TDA) methods face a trade-off between accuracy and efficiency: gradient-based approaches are theoretically rigorous but computationally expensive, while representation-based methods scale well yet rely on non-task-optimized heuristic embeddings, compromising fidelity. This paper introduces AirRepβthe first end-to-end, scalable representation learning framework specifically designed for TDA. Its core innovations are: (1) a trainable attribution encoder coupled with attention-based pooling to yield model-aligned, task-specific, fine-grained representations; and (2) a ranking loss grounded in empirical influence estimation, enabling efficient end-to-end optimization. Evaluated on instruction-tuned large language models, AirRep matches the state-of-the-art accuracy of gradient-based methods while accelerating inference by nearly two orders of magnitude. Moreover, it demonstrates strong robustness across diverse tasks and model architectures.
π Abstract
Training data attribution (TDA) methods aim to measure how training data impacts a model's predictions. While gradient-based attribution methods, such as influence functions, offer theoretical grounding, their computational costs make them impractical for large-scale applications. Representation-based approaches are far more scalable, but typically rely on heuristic embeddings that are not optimized for attribution, limiting their fidelity. To address these challenges, we propose AirRep, a scalable, representation-based approach that closes this gap by learning task-specific and model-aligned representations optimized explicitly for TDA. AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence. We train AirRep using a ranking objective over automatically constructed training subsets labeled by their empirical effect on target predictions. Experiments on instruction-tuned LLMs demonstrate that AirRep achieves performance on par with state-of-the-art gradient-based approaches while being nearly two orders of magnitude more efficient at inference time. Further analysis highlights its robustness and generalization across tasks and models. Our code is available at https://github.com/sunnweiwei/AirRep.