🤖 AI Summary
Existing spatial regression methods struggle to effectively model health disparities characterized by strong geographic heterogeneity, compositional income data, and discontinuous regional effects. This study proposes a geographically weighted penalized compositional regression model that innovatively integrates pairwise fusion penalties with the minimax concave penalty (MCP), thereby relaxing conventional assumptions of spatial smoothness and adjacency. The approach identifies clusters of regions with similar socioeconomic structures, even when those regions are not geographically contiguous. Notably, this work is the first to introduce non-convex regularization into compositional data regression, enabling precise capture of discontinuous spatial heterogeneity. Applied to the analysis of COPD prevalence and income composition in the United States, the method uncovers heterogeneous associations obscured by traditional models, substantially enhancing accuracy, interpretability, and scalability.
📝 Abstract
Income inequality is a major contributor to health disparities, yet its effects often vary by geography and are commonly represented as compositional distributions (e.g., proportions of households across income brackets). Existing spatial regression methods struggle in this setting: they typically assume smooth spatial variation, cannot accommodate abrupt spatial heterogeneity, and lack principled treatment of compositional covariates. We propose a geographically weighted penalized compositional regression model that addresses these challenges simultaneously. Our method adopts a pairwise fusion penalty that enables detection of both contiguous and noncontiguous regional clusters with shared regression effects, thereby relaxing strong assumptions of spatial smoothness and geographic contiguity. This allows regions with similar underlying socioeconomic structures to be identified even when they are not geographically adjacent. By incorporating nonconvex penalties, such as the minimax concave penalty (MCP), the approach achieves improved estimation accuracy, interpretability, and scalability in high-dimensional spatial settings. We illustrate the method through an analysis linking U.S. income composition to chronic obstructive pulmonary disease (COPD) prevalence, revealing spatially heterogeneous associations that are obscured by conventional models. The proposed framework provides a flexible and robust tool for spatial data analysis involving compositional predictors and region-specific heterogeneity.