🤖 AI Summary
This paper addresses three key limitations of Beta regression for modeling continuous proportion data: sensitivity to distributional misspecification, poor handling of boundary values (0/1), and low computational efficiency. To overcome these, we propose an scalable robust regression framework. Methodologically, we (1) introduce the continuous binomial (cobin) distribution and its dispersion-mixed variant (micobin), which naturally accommodate boundary values and enhance distributional robustness; (2) develop the Kolmogorov–Gamma data augmentation strategy, enabling efficient Bayesian Gibbs sampling and inference under complex hierarchical structures—including nested, longitudinal, and spatial designs; and (3) validate the framework via simulation studies and empirical analysis of multi-metric benthic macroinvertebrate data from U.S. lakes. Results demonstrate substantial improvements in parameter estimation robustness and boundary-value calibration accuracy, alongside computational speedups of several-fold over standard Beta regression.
📝 Abstract
Beta regression is used routinely for continuous proportional data, but it often encounters practical issues such as a lack of robustness of regression parameter estimates to misspecification of the beta distribution. We develop an improved class of generalized linear models starting with the continuous binomial (cobin) distribution and further extending to dispersion mixtures of cobin distributions (micobin). The proposed cobin regression and micobin regression models have attractive robustness, computation, and flexibility properties. A key innovation is the Kolmogorov-Gamma data augmentation scheme, which facilitates Gibbs sampling for Bayesian computation, including in hierarchical cases involving nested, longitudinal, or spatial data. We demonstrate robustness, ability to handle responses exactly at the boundary (0 or 1), and computational efficiency relative to beta regression in simulation experiments and through analysis of the benthic macroinvertebrate multimetric index of US lakes using lake watershed covariates.