🤖 AI Summary
Bayesian density and clustering modeling inherently face a tension between interpretability and flexibility: overparameterized models achieve high fit accuracy but lack transparency. This paper proposes a posterior-projection summarization framework that, grounded in decision theory, optimally compresses the high-dimensional posterior predictive distribution into a low-dimensional parametric density and clustering representation—preserving the original model’s fit fidelity while quantifying uncertainty. Our approach is the first to systematically integrate nonparametric modeling, posterior dimensionality reduction, and Bayesian uncertainty propagation, thereby jointly optimizing statistical interpretability and fitting fidelity. Experiments on synthetic and real-world datasets demonstrate that the resulting summaries enjoy theoretical guarantees and practical utility: fitting loss remains tightly controlled, and model transparency—as well as downstream interpretability for analysis—is substantially enhanced.
📝 Abstract
The usefulness of Bayesian models for density and cluster estimation is well established across multiple literatures. However, there is still a known tension between the use of simpler, more interpretable models and more flexible, complex ones. In this paper, we propose a novel method that integrates these two approaches by projecting the fit of a flexible, over-parameterized model onto a lower-dimensional parametric summary, which serves as a surrogate. This process increases interpretability while preserving most of the fit of the original model. Our approach involves three main steps. First, we fit the data using nonparametric or over-parameterized models. Second, we project the posterior predictive distribution of the original model onto a sequence of parametric summary estimates using a decision-theoretic approach. Finally, given the lower parametric dimension of the summary estimate that best approximates the original model learned in the second step, we construct uncertainty quantification for the summary by projecting the original full posterior distribution. We demonstrate the effectiveness of our method in summarizing a variety of nonparametric and overparameterized models, providing uncertainty quantification for both density and cluster summaries on synthetic and real datasets.