Learning to Weight Parameters for Data Attribution

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses data attribution in generative models—identifying the training samples most influential to a given output—and identifies a key limitation of existing methods: their disregard for the model’s hierarchical architecture leads to coarse-grained, structure-agnostic attributions. To overcome this, we propose the first learnable, hierarchical parameter importance weighting mechanism, enabling fine-grained, architecture-aware data attribution without requiring labeled data. Our approach integrates gradient-based attribution with unsupervised modeling of parameter importance across layers, where layer-specific weights adaptively capture differential semantic information extraction capabilities—e.g., for subject, style, and background. Extensive evaluation on multiple diffusion models demonstrates substantial improvements in attribution accuracy and enables interpretable localization of the semantic origins of generated outputs.

Technology Category

Application Category

📝 Abstract

We study data attribution in generative models, aiming to identify which training examples most influence a given output. Existing methods achieve this by tracing gradients back to training data. However, they typically treat all network parameters uniformly, ignoring the fact that different layers encode different types of information and may thus draw information differently from the training set. We propose a method that models this by learning parameter importance weights tailored for attribution, without requiring labeled data. This allows the attribution process to adapt to the structure of the model, capturing which training examples contribute to specific semantic aspects of an output, such as subject, style, or background. Our method improves attribution accuracy across diffusion models and enables fine-grained insights into how outputs borrow from training data.

Problem

Research questions and friction points this paper is trying to address.

Identify influential training data for generative model outputs

Learn parameter importance weights for accurate data attribution

Capture training contributions to specific output semantic aspects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns parameter importance weights for attribution

Adapts to model structure without labeled data

Improves accuracy in diffusion models

🔎 Similar Papers

No similar papers found.