🤖 AI Summary
Existing evaluation methods for generative models often rely on assumptions about the true data density, require auxiliary pretrained models, or depend on handcrafted feature engineering. To address these limitations, we propose PQMass—a nonparametric, likelihood-free evaluation framework that makes no assumptions about the functional form of the underlying density and requires no external models or feature extraction. PQMass adaptively partitions the multidimensional sample space into bins, estimates probability mass distributions over these bins for both real and generated data, and quantifies their discrepancy via a multivariate chi-square test, yielding a statistically rigorous p-value. This is the first approach to unify spatial binning with chi-square testing for joint assessment of generative model quality, novelty, and diversity. Theoretically grounded and plug-and-play, PQMass demonstrates empirical effectiveness on multimodal and medium-to-high-dimensional datasets without dimensionality reduction, feature engineering, or prohibitive computational overhead.
📝 Abstract
We propose a likelihood-free method for comparing two distributions given samples from each, with the goal of assessing the quality of generative models. The proposed approach, PQMass, provides a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models. PQMass divides the sample space into non-overlapping regions and applies chi-squared tests to the number of data samples that fall within each region, giving a p-value that measures the probability that the bin counts derived from two sets of samples are drawn from the same multinomial distribution. PQMass does not depend on assumptions regarding the density of the true distribution, nor does it rely on training or fitting any auxiliary models. We evaluate PQMass on data of various modalities and dimensions, demonstrating its effectiveness in assessing the quality, novelty, and diversity of generated samples. We further show that PQMass scales well to moderately high-dimensional data and thus obviates the need for feature extraction in practical applications.