Nonparametric Data Attribution for Diffusion Models

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the limitations of existing data attribution methods for diffusion models—which typically require gradient computations or model retraining and thus hinder applicability in proprietary or large-scale settings—this paper proposes a gradient-free, retraining-free, nonparametric attribution method. Our approach leverages local patch-wise similarity between generated and training images, performing attribution via an analytically derived optimal scoring function in a multi-scale feature space. This constitutes the first natural extension of nonparametric attribution to multi-scale representations, without reliance on specific model architectures. By integrating convolutional acceleration and a purely data-driven framework, the method achieves both spatial interpretability and computational efficiency. Experiments demonstrate that our method attains attribution accuracy comparable to gradient-based approaches, significantly outperforms existing nonparametric baselines, and scales effectively to large datasets and real-world deployment scenarios.

Technology Category

Application Category

📝 Abstract

Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs. Existing methods for diffusion models typically require access to model gradients or retraining, limiting their applicability in proprietary or large-scale settings. We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images. Our approach is grounded in the analytical form of the optimal score function and naturally extends to multiscale representations, while remaining computationally efficient through convolution-based acceleration. In addition to producing spatially interpretable attributions, our framework uncovers patterns that reflect intrinsic relationships between training data and outputs, independent of any specific model. Experiments demonstrate that our method achieves strong attribution performance, closely matching gradient-based approaches and substantially outperforming existing nonparametric baselines. Code is available at https://github.com/sail-sg/NDA.

Problem

Research questions and friction points this paper is trying to address.

Quantifying training data influence on diffusion model outputs

Developing nonparametric attribution without gradients or retraining

Measuring influence via patch-level image similarity analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonparametric method using patch-level similarity

Multiscale representation with convolution-based acceleration

Model-independent attribution via analytical score function

🔎 Similar Papers

CausalConceptTS: Causal Attributions for Time Series Classification using High Fidelity Diffusion Models