🤖 AI Summary
This work addresses the training data attribution (TDA) problem—efficiently and accurately quantifying the influence of individual training samples on a large model’s specific predictions. We propose a novel attribution paradigm that, for the first time, links uncertainty estimation with influence functions, eliminating the need for gradients or second-order derivatives and thus removing reliance on white-box model access. Our method introduces a fine-tuning strategy based on ensembles of model perturbations and a cross-model single-sample loss covariance framework, enabling unified support for both white-box models and black-box APIs (e.g., GPT series). Evaluated on vision tasks and LLM fine-tuning, it achieves significantly improved attribution accuracy, scales to models with tens of billions of parameters, and reduces computational overhead by one to two orders of magnitude compared to gradient-based methods. The approach provides a scalable, practical tool for data debugging, selection, and valuation.
📝 Abstract
Training data attribution (TDA) methods aim to identify which training examples influence a model's predictions on specific test data most. By quantifying these influences, TDA supports critical applications such as data debugging, curation, and valuation. Gradient-based TDA methods rely on gradients and second-order information, limiting their applicability at scale. While recent random projection-based methods improve scalability, they often suffer from degraded attribution accuracy. Motivated by connections between uncertainty and influence functions, we introduce Daunce - a simple yet effective data attribution approach through uncertainty estimation. Our method operates by fine-tuning a collection of perturbed models and computing the covariance of per-example losses across these models as the attribution score. Daunce is scalable to large language models (LLMs) and achieves more accurate attribution compared to existing TDA methods. We validate Daunce on tasks ranging from vision tasks to LLM fine-tuning, and further demonstrate its compatibility with black-box model access. Applied to OpenAI's GPT models, our method achieves, to our knowledge, the first instance of data attribution on proprietary LLMs.