๐ค AI Summary
Existing data valuation methods primarily target discriminative models and lack adaptability to generative models; moreover, current generative valuation approaches often rely on specific architectures and suffer from limited robustness and efficiency. This paper proposes GMValuatorโthe first training-free, model-agnostic framework for generative data valuation. It leverages fine-grained similarity matching between generated samples and training data, incorporates an image-quality-aware bias calibration mechanism, and establishes a four-dimensional interpretable evaluation criterion: reasonableness, fidelity, diversity, and consistency. The method integrates no-reference image quality assessment (NR-IQA), cross-domain nearest-neighbor retrieval, and contribution attribution propagation. Extensive experiments on StyleGAN2, DDPM, and benchmarks including FFHQ and CIFAR-10 demonstrate that GMValuator achieves significantly higher valuation accuracy than state-of-the-art baselines, with over 5ร improvement in computational efficiency.
๐ Abstract
Data valuation plays a crucial role in machine learning. Existing data valuation methods have primarily focused on discriminative models, neglecting generative models that have recently gained considerable attention. A very few existing attempts of data valuation method designed for deep generative models either concentrates on specific models or lacks robustness in their outcomes. Moreover, efficiency still reveals vulnerable shortcomings. To bridge the gaps, we formulate the data valuation problem in generative models from a similarity-matching perspective. Specifically, we introduce Generative Model Valuator (GMValuator), the first training-free and model-agnostic approach to provide data valuation for generation tasks. It empowers efficient data valuation through our innovatively similarity matching module, calibrates biased contribution by incorporating image quality assessment, and attributes credits to all training samples based on their contributions to the generated samples. Additionally, we introduce four evaluation criteria for assessing data valuation methods in generative models, aligning with principles of plausibility and truthfulness. GMValuator is extensively evaluated on various datasets and generative architectures to demonstrate its effectiveness.