A Nonparametric Maximum Likelihood Approach to Mixture of Regression

📅 2021-08-22

📈 Citations: 3

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses modeling population heterogeneity in mixed linear regression (i.e., random-coefficient) models. We propose a fully nonparametric maximum likelihood estimator (NPMLE) for the unknown mixing distribution (G^*), without prespecifying its parametric form or the number of components. Our method directly computes the NPMLE via convex optimization—yielding the first rigorous proof of its existence—and establishes, for finite samples, an optimal parametric-rate bound (up to logarithmic factors) on the Hellinger estimation error, circumventing discretization-induced bias inherent in conventional approaches. Theoretically and empirically, the estimator achieves both statistical efficiency and computational tractability: it significantly outperforms EM-based parametric methods on both discrete and continuous mixture simulations, as well as two real-world datasets, demonstrating strong robustness and practical utility.

📝 Abstract

Mixture of regression models are useful for regression analysis in heterogeneous populations where a single regression model may not be appropriate for the entire population. We study the nonparametric maximum likelihood estimator (NPMLE) for fitting these models. The NPMLE is based on convex optimization and does not require prior specification of the number of mixture components. We establish existence of the NPMLE and prove finite-sample parametric (up to logarithmic multiplicative factors) Hellinger error bounds for the predicted density functions. We also provide an effective procedure for computing the NPMLE without ad-hoc discretization and prove a theoretical convergence rate under certain assumptions. Numerical experiments on simulated data for both discrete and non-discrete mixing distributions demonstrate the remarkable performances of our approach. We also illustrate the approach on two real datasets.

Problem

Research questions and friction points this paper is trying to address.

Estimates unknown distribution of regression coefficients nonparametrically

Achieves near-parametric rates in conditional density estimation

Provides posterior-based individualized coefficient inference empirically

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonparametric maximum likelihood estimator for regression mixtures

Discrete approximation via exemplar method

Empirical Bayes for individualized coefficient inference

🔎 Similar Papers

Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models