Enhancing Training Data Attribution with Representational Optimization

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing training data attribution (TDA) methods face a trade-off between accuracy and efficiency: gradient-based approaches are theoretically rigorous but computationally expensive, while representation-based methods scale well yet rely on non-task-optimized heuristic embeddings, compromising fidelity. This paper introduces AirRep—the first end-to-end, scalable representation learning framework specifically designed for TDA. Its core innovations are: (1) a trainable attribution encoder coupled with attention-based pooling to yield model-aligned, task-specific, fine-grained representations; and (2) a ranking loss grounded in empirical influence estimation, enabling efficient end-to-end optimization. Evaluated on instruction-tuned large language models, AirRep matches the state-of-the-art accuracy of gradient-based methods while accelerating inference by nearly two orders of magnitude. Moreover, it demonstrates strong robustness across diverse tasks and model architectures.

Technology Category

Application Category

📝 Abstract

Training data attribution (TDA) methods aim to measure how training data impacts a model's predictions. While gradient-based attribution methods, such as influence functions, offer theoretical grounding, their computational costs make them impractical for large-scale applications. Representation-based approaches are far more scalable, but typically rely on heuristic embeddings that are not optimized for attribution, limiting their fidelity. To address these challenges, we propose AirRep, a scalable, representation-based approach that closes this gap by learning task-specific and model-aligned representations optimized explicitly for TDA. AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence. We train AirRep using a ranking objective over automatically constructed training subsets labeled by their empirical effect on target predictions. Experiments on instruction-tuned LLMs demonstrate that AirRep achieves performance on par with state-of-the-art gradient-based approaches while being nearly two orders of magnitude more efficient at inference time. Further analysis highlights its robustness and generalization across tasks and models. Our code is available at https://github.com/sunnweiwei/AirRep.

Problem

Research questions and friction points this paper is trying to address.

Improving scalability of training data attribution methods

Optimizing representation-based approaches for accurate attribution

Balancing computational efficiency with gradient-based method performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning task-specific representations for TDA

Trainable encoder for attribution quality

Attention-based pooling for group-wise influence

🔎 Similar Papers

Data Shapley in One Training Run