🤖 AI Summary
Statistical inference for generalized linear models (GLMs) with high-dimensional longitudinal clustered data remains challenging, particularly under conventional sparsity assumptions.
Method: This paper proposes a desparsified Dantzig selector approach that relaxes the strict sparsity requirement and achieves asymptotically optimal efficiency when the working correlation structure is correctly specified. By integrating Dantzig selector-based regularization with desparsification correction, the method unifies inference for both linear and generalized linear models.
Contribution/Results: We establish consistency and asymptotic normality of the estimator under mild regularity conditions. Numerical experiments demonstrate superior finite-sample performance over existing methods for both continuous and binary outcomes. Applied to bacterial riboflavin production genetic data, the method significantly improves accuracy and interpretability in inferring key gene effects.
📝 Abstract
In this paper, we consider statistical inference with generalized linear models in high dimensions under a longitudinal clustered data framework. Specifically, we propose a de-sparsified version of an initial Dantzig-type regularized estimator in regression settings and provide theoretical justification for both linear and generalized linear models. We present extensive numerical simulations demonstrating the effectiveness of our method for continuous and binary data. For continuous outcomes under linear models, we show that our estimator asymptotically attains an appropriate efficiency bound when the correlation structure is correctly specified. We conclude with an application of our method to a well-established genetics dataset, with bacterial riboflavin production as the outcome of interest.