Robust confidence intervals for generalized linear models

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the issue of undercoverage in conventional confidence intervals for generalized linear models when heteroscedasticity—such as overdispersion or unobserved biological variability—is present. The authors propose a robust inference method that does not rely on correct specification of the variance structure. By constructing hypothesis tests through sign-flipping of individual score contributions and employing a bisection algorithm to determine confidence interval endpoints, the approach maintains asymptotic validity even under misspecified variance models. Simulation studies demonstrate that the proposed method achieves substantially better coverage than standard Wald-type intervals. Its practical utility is further confirmed through application to differential expression analysis in cancer RNA-seq data, where it yields reliable inference despite complex sources of variability.

📝 Abstract

Reliable uncertainty quantification is a central challenge in the analysis of modern biomedical data, where complex sources of variability often violate standard modeling assumptions. In generalized linear models (GLMs), confidence intervals for regression parameters provide such information, but they typically rely on correct specification of the mean-variance relationship. However, overdispersion, heteroskedasticity, and unobserved biological variability can lead to substantial undercoverage in practice. We propose a method for constructing confidence intervals that remains valid under variance misspecification. The approach is based on the inversion of hypothesis tests obtained by sign-flipping individual score contributions and uses a bisection algorithm to determine the interval bounds. The resulting intervals inherit robustness properties from the underlying tests, and we establish their asymptotic validity under general variance misspecification. Through simulation studies, we show that the proposed method achieves reliable coverage and outperforms standard Wald-type intervals when model assumptions are violated. We illustrate the approach in a differential expression analysis of RNA-sequencing data from a cancer study, where heterogeneous variability is pervasive and parametric methods can yield inconsistent inference. The proposed framework provides a practical and robust alternative to conventional quasi-likelihood or Wald-based methods for interval estimation in GLMs, particularly suited to high-throughput biomedical applications.

Problem

Research questions and friction points this paper is trying to address.

generalized linear models

confidence intervals

variance misspecification

overdispersion

heteroskedasticity

Innovation

Methods, ideas, or system contributions that make the work stand out.

robust confidence intervals

generalized linear models

variance misspecification