Nonparametric Data Attribution for Diffusion Models

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of existing data attribution methods for diffusion models—which typically require gradient computations or model retraining and thus hinder applicability in proprietary or large-scale settings—this paper proposes a gradient-free, retraining-free, nonparametric attribution method. Our approach leverages local patch-wise similarity between generated and training images, performing attribution via an analytically derived optimal scoring function in a multi-scale feature space. This constitutes the first natural extension of nonparametric attribution to multi-scale representations, without reliance on specific model architectures. By integrating convolutional acceleration and a purely data-driven framework, the method achieves both spatial interpretability and computational efficiency. Experiments demonstrate that our method attains attribution accuracy comparable to gradient-based approaches, significantly outperforms existing nonparametric baselines, and scales effectively to large datasets and real-world deployment scenarios.

Technology Category

Application Category

📝 Abstract
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs. Existing methods for diffusion models typically require access to model gradients or retraining, limiting their applicability in proprietary or large-scale settings. We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images. Our approach is grounded in the analytical form of the optimal score function and naturally extends to multiscale representations, while remaining computationally efficient through convolution-based acceleration. In addition to producing spatially interpretable attributions, our framework uncovers patterns that reflect intrinsic relationships between training data and outputs, independent of any specific model. Experiments demonstrate that our method achieves strong attribution performance, closely matching gradient-based approaches and substantially outperforming existing nonparametric baselines. Code is available at https://github.com/sail-sg/NDA.
Problem

Research questions and friction points this paper is trying to address.

Quantifying training data influence on diffusion model outputs
Developing nonparametric attribution without gradients or retraining
Measuring influence via patch-level image similarity analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonparametric method using patch-level similarity
Multiscale representation with convolution-based acceleration
Model-independent attribution via analytical score function
🔎 Similar Papers
No similar papers found.
Y
Yutian Zhao
Sea AI Lab, Singapore; Department of Mathematics, National University of Singapore
C
Chao Du
Sea AI Lab, Singapore
Xiaosen Zheng
Xiaosen Zheng
Researcher @ TikTok
Code AIData-Centric AI
T
Tianyu Pang
Sea AI Lab, Singapore
Min Lin
Min Lin
Principal Research Scientist, Sea AI Lab
Artificial Intelligence