🤖 AI Summary
Statistical analysis of black-box generative models—whose weights, pretraining data, and model covariates are inaccessible—remains challenging due to the absence of internal model information.
Method: This paper introduces a data-centric kernel embedding framework that maps each generative model into a reproducing kernel Hilbert space (RKHS) induced by its output sample distribution, yielding model-level comparable representations. The method integrates functional-space projection, maximum mean discrepancy (MMD)-based distributional distance estimation, and nonparametric hypothesis testing to enable interpretable, cross-model statistical inference without requiring internal model access.
Contribution/Results: It is the first approach to achieve purely input–output behavior-driven kernel-space embedding of generative models, circumventing black-box constraints. Evaluated on model clustering, anomaly detection, and performance attribution, it significantly outperforms baselines while exhibiting strong generalizability and plug-and-play applicability. This work establishes a novel, covariate-free paradigm for evaluating generative models under strict black-box conditions.
📝 Abstract
Generative models are capable of producing human-expert level content across a variety of topics and domains. As the impact of generative models grows, it is necessary to develop statistical methods to understand collections of available models. These methods are particularly important in settings where the user may not have access to information related to a model's pre-training data, weights, or other relevant model-level covariates. In this paper we extend recent results on representations of black-box generative models to model-level statistical inference tasks. We demonstrate that the model-level representations are effective for multiple inference tasks.