🤖 AI Summary
To address the limitations of existing data attribution methods for diffusion models—which typically require gradient computations or model retraining and thus hinder applicability in proprietary or large-scale settings—this paper proposes a gradient-free, retraining-free, nonparametric attribution method. Our approach leverages local patch-wise similarity between generated and training images, performing attribution via an analytically derived optimal scoring function in a multi-scale feature space. This constitutes the first natural extension of nonparametric attribution to multi-scale representations, without reliance on specific model architectures. By integrating convolutional acceleration and a purely data-driven framework, the method achieves both spatial interpretability and computational efficiency. Experiments demonstrate that our method attains attribution accuracy comparable to gradient-based approaches, significantly outperforms existing nonparametric baselines, and scales effectively to large datasets and real-world deployment scenarios.
📝 Abstract
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs. Existing methods for diffusion models typically require access to model gradients or retraining, limiting their applicability in proprietary or large-scale settings. We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images. Our approach is grounded in the analytical form of the optimal score function and naturally extends to multiscale representations, while remaining computationally efficient through convolution-based acceleration. In addition to producing spatially interpretable attributions, our framework uncovers patterns that reflect intrinsic relationships between training data and outputs, independent of any specific model. Experiments demonstrate that our method achieves strong attribution performance, closely matching gradient-based approaches and substantially outperforming existing nonparametric baselines. Code is available at https://github.com/sail-sg/NDA.