Influence Functions for Scalable Data Attribution in Diffusion Models

📅 2024-10-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models face challenges in data attribution and interpretability—specifically, quantifying the causal influence of individual training samples on generated outputs. Method: We propose the first scalable influence function framework for diffusion models, introducing influence functions systematically into this domain. We design a proxy metric based on probability change to align with generative objectives, unify existing attribution methods as special cases of our framework, and employ K-FAC and generalized Gauss–Newton approximations to efficiently estimate the Hessian, ensuring scalability to large-scale models. Contributions/Results: Our method achieves state-of-the-art performance on standard benchmarks—including the Linear Data-modelling Score (LDS) and high-influence sample removal followed by retraining—without task-specific hyperparameter tuning. It provides a theoretically grounded, computationally feasible paradigm for trustworthy attribution in generative models.

Technology Category

Application Category

📝 Abstract
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models by developing an influence functions framework. Influence function-based data attribution methods approximate how a model's output would have changed if some training data were removed. In supervised learning, this is usually used for predicting how the loss on a particular example would change. For diffusion models, we focus on predicting the change in the probability of generating a particular example via several proxy measurements. We show how to formulate influence functions for such quantities and how previously proposed methods can be interpreted as particular design choices in our framework. To ensure scalability of the Hessian computations in influence functions, we systematically develop K-FAC approximations based on generalised Gauss-Newton matrices specifically tailored to diffusion models. We recast previously proposed methods as specific design choices in our framework and show that our recommended method outperforms previous data attribution approaches on common evaluations, such as the Linear Data-modelling Score (LDS) or retraining without top influences, without the need for method-specific hyperparameter tuning.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Models
Data Attribution
Prediction Change Estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Influence Functions
Diffusion Models
Computational Efficiency
🔎 Similar Papers
No similar papers found.