Learning to Weight Parameters for Data Attribution

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses data attribution in generative models—identifying the training samples most influential to a given output—and identifies a key limitation of existing methods: their disregard for the model’s hierarchical architecture leads to coarse-grained, structure-agnostic attributions. To overcome this, we propose the first learnable, hierarchical parameter importance weighting mechanism, enabling fine-grained, architecture-aware data attribution without requiring labeled data. Our approach integrates gradient-based attribution with unsupervised modeling of parameter importance across layers, where layer-specific weights adaptively capture differential semantic information extraction capabilities—e.g., for subject, style, and background. Extensive evaluation on multiple diffusion models demonstrates substantial improvements in attribution accuracy and enables interpretable localization of the semantic origins of generated outputs.

Technology Category

Application Category

📝 Abstract
We study data attribution in generative models, aiming to identify which training examples most influence a given output. Existing methods achieve this by tracing gradients back to training data. However, they typically treat all network parameters uniformly, ignoring the fact that different layers encode different types of information and may thus draw information differently from the training set. We propose a method that models this by learning parameter importance weights tailored for attribution, without requiring labeled data. This allows the attribution process to adapt to the structure of the model, capturing which training examples contribute to specific semantic aspects of an output, such as subject, style, or background. Our method improves attribution accuracy across diffusion models and enables fine-grained insights into how outputs borrow from training data.
Problem

Research questions and friction points this paper is trying to address.

Identify influential training data for generative model outputs
Learn parameter importance weights for accurate data attribution
Capture training contributions to specific output semantic aspects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns parameter importance weights for attribution
Adapts to model structure without labeled data
Improves accuracy in diffusion models
🔎 Similar Papers
No similar papers found.