🤖 AI Summary
This paper addresses modeling population heterogeneity in mixed linear regression (i.e., random-coefficient) models. We propose a fully nonparametric maximum likelihood estimator (NPMLE) for the unknown mixing distribution (G^*), without prespecifying its parametric form or the number of components. Our method directly computes the NPMLE via convex optimization—yielding the first rigorous proof of its existence—and establishes, for finite samples, an optimal parametric-rate bound (up to logarithmic factors) on the Hellinger estimation error, circumventing discretization-induced bias inherent in conventional approaches. Theoretically and empirically, the estimator achieves both statistical efficiency and computational tractability: it significantly outperforms EM-based parametric methods on both discrete and continuous mixture simulations, as well as two real-world datasets, demonstrating strong robustness and practical utility.
📝 Abstract
Mixture of regression models are useful for regression analysis in heterogeneous populations where a single regression model may not be appropriate for the entire population. We study the nonparametric maximum likelihood estimator (NPMLE) for fitting these models. The NPMLE is based on convex optimization and does not require prior specification of the number of mixture components. We establish existence of the NPMLE and prove finite-sample parametric (up to logarithmic multiplicative factors) Hellinger error bounds for the predicted density functions. We also provide an effective procedure for computing the NPMLE without ad-hoc discretization and prove a theoretical convergence rate under certain assumptions. Numerical experiments on simulated data for both discrete and non-discrete mixing distributions demonstrate the remarkable performances of our approach. We also illustrate the approach on two real datasets.