GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of efficiently attributing the influence of specific training data groups—such as artistic styles or semantic categories—on the outputs of diffusion models. To this end, it introduces, for the first time, a counterfactual modeling framework based on machine unlearning to enable group-level attribution without retraining. The method approximates a "leave-one-group-out" (LOGO) model by unlearning a target data group and leverages ELBO-based likelihood scoring to quantify each group’s contribution. Evaluated on CIFAR-10 and Stable Diffusion with artistic style attribution tasks, the approach accurately identifies dominant training groups, achieving approximately 100× speedup over conventional LOGO retraining. It substantially outperforms baseline methods—including semantic similarity, gradient-based, and instance-level attribution—while striking a favorable balance between attribution accuracy and computational scalability.

Technology Category

Application Category

📝 Abstract

Training-data attribution for vision generative models aims to identify which training data influenced a given output. While most methods score individual examples, practitioners often need group-level answers (e.g., artistic styles or object classes). Group-wise attribution is counterfactual: how would a model's behavior on a generated sample change if a group were absent from training? A natural realization of this counterfactual is Leave-One-Group-Out (LOGO) retraining, which retrains the model with each group removed; however, it becomes computationally prohibitive as the number of groups grows. We propose GUDA (Group Unlearning-based Data Attribution) for diffusion models, which approximates each counterfactual model by applying machine unlearning to a shared full-data model instead of training from scratch. GUDA quantifies group influence using differences in a likelihood-based scoring rule (ELBO) between the full model and each unlearned counterfactual. Experiments on CIFAR-10 and artistic style attribution with Stable Diffusion show that GUDA identifies primary contributing groups more reliably than semantic similarity, gradient-based attribution, and instance-level unlearning approaches, while achieving x100 speedup on CIFAR-10 over LOGO retraining.

Problem

Research questions and friction points this paper is trying to address.

training-data attribution

group-wise attribution

diffusion models

counterfactual analysis

machine unlearning

Innovation

Methods, ideas, or system contributions that make the work stand out.

group-wise attribution

machine unlearning

diffusion models