🤖 AI Summary
High-dimensional Gaussian graphical models suffer from poor interpretability and unstable parameter estimation. To address these issues, we propose the Clusterpath Graphical Model (CGGM), the first framework that explicitly embeds variable clustering into graphical model estimation. CGGM employs an aggregation penalty to automatically group variables and learn a precision matrix with a block structure—ensuring consistency with the underlying covariance’s block structure. The method integrates aggregation regularization with ℓ₁ sparsity regularization, formulated as a convex optimization problem and solved via a cyclic block coordinate descent algorithm. Extensive simulations and experiments on diverse real-world datasets demonstrate that CGGM significantly outperforms state-of-the-art methods in estimation accuracy, robustness, interpretability, and practical utility. By jointly enforcing structural sparsity and cluster structure, CGGM establishes a new paradigm for high-dimensional graphical modeling that achieves both statistical stability and structural interpretability.
📝 Abstract
Graphical models serve as effective tools for visualizing conditional dependencies between variables. However, as the number of variables grows, interpretation becomes increasingly difficult, and estimation uncertainty increases due to the large number of parameters relative to the number of observations. To address these challenges, we introduce the Clusterpath estimator of the Gaussian Graphical Model (CGGM) that encourages variable clustering in the graphical model in a data-driven way. Through the use of an aggregation penalty, we group variables together, which in turn results in a block-structured precision matrix whose block structure remains preserved in the covariance matrix. The CGGM estimator is formulated as the solution to a convex optimization problem, making it easy to incorporate other popular penalization schemes which we illustrate through the combination of an aggregation and sparsity penalty. We present a computationally efficient implementation of the CGGM estimator by using a cyclic block coordinate descent algorithm. In simulations, we show that CGGM not only matches, but oftentimes outperforms other state-of-the-art methods for variable clustering in graphical models. We also demonstrate CGGM's practical advantages and versatility on a diverse collection of empirical applications.