Scalable and robust regression models for continuous proportional data

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses three key limitations of Beta regression for modeling continuous proportion data: sensitivity to distributional misspecification, poor handling of boundary values (0/1), and low computational efficiency. To overcome these, we propose an scalable robust regression framework. Methodologically, we (1) introduce the continuous binomial (cobin) distribution and its dispersion-mixed variant (micobin), which naturally accommodate boundary values and enhance distributional robustness; (2) develop the Kolmogorov–Gamma data augmentation strategy, enabling efficient Bayesian Gibbs sampling and inference under complex hierarchical structures—including nested, longitudinal, and spatial designs; and (3) validate the framework via simulation studies and empirical analysis of multi-metric benthic macroinvertebrate data from U.S. lakes. Results demonstrate substantial improvements in parameter estimation robustness and boundary-value calibration accuracy, alongside computational speedups of several-fold over standard Beta regression.

Technology Category

Application Category

📝 Abstract
Beta regression is used routinely for continuous proportional data, but it often encounters practical issues such as a lack of robustness of regression parameter estimates to misspecification of the beta distribution. We develop an improved class of generalized linear models starting with the continuous binomial (cobin) distribution and further extending to dispersion mixtures of cobin distributions (micobin). The proposed cobin regression and micobin regression models have attractive robustness, computation, and flexibility properties. A key innovation is the Kolmogorov-Gamma data augmentation scheme, which facilitates Gibbs sampling for Bayesian computation, including in hierarchical cases involving nested, longitudinal, or spatial data. We demonstrate robustness, ability to handle responses exactly at the boundary (0 or 1), and computational efficiency relative to beta regression in simulation experiments and through analysis of the benthic macroinvertebrate multimetric index of US lakes using lake watershed covariates.
Problem

Research questions and friction points this paper is trying to address.

Develop robust regression models for proportional data
Address beta regression limitations via cobin distributions
Enable efficient Bayesian computation for complex data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses cobin and micobin distributions for robustness
Implements Kolmogorov-Gamma data augmentation scheme
Enhances Bayesian computation via Gibbs sampling
🔎 Similar Papers
No similar papers found.
Changwoo J. Lee
Changwoo J. Lee
Postdoctoral Associate, Duke University
Probabilistic machine learningBayesian statisticsEnvironmental epidemiologyClustering
B
Benjamin K. Dahl
Department of Statistical Science, Duke University
Otso Ovaskainen
Otso Ovaskainen
University of Jyväskylä
Statistical Ecology
D
David B. Dunson
Department of Statistical Science, Duke University