Adaptive debiased SGD in high-dimensional GLMs with streaming data

📅 2024-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-time updating of regression coefficients and their standard errors in high-dimensional streaming generalized linear models poses significant computational and statistical challenges. Method: We propose a single-pass online algorithm that integrates adaptive stochastic gradient descent with a novel online debiasing mechanism, introducing the first Adaptive Debiasing Lasso (ADL) estimator under dynamic loss. We rigorously establish its asymptotic normality. Contribution/Results: ADL requires neither raw data storage nor high-dimensional summary statistics, drastically reducing time and space complexity. In simulations and a spam classification task, ADL consistently outperforms baseline methods—achieving superior statistical accuracy (e.g., nominal coverage of confidence intervals) and computational efficiency (speedups of several-fold over state-of-the-art approaches). It is the first streaming solution for high-dimensional sparse statistical inference that simultaneously offers theoretical guarantees—namely, asymptotic normality and valid uncertainty quantification—and practical engineering feasibility.

Technology Category

Application Category

📝 Abstract
Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing methods that either require full dataset access or large-dimensional summary statistics storage, our method operates in a single-pass mode, significantly reducing both time and space complexity. The core of our methodological innovation lies in an adaptive stochastic gradient descent algorithm tailored for dynamic objective functions, coupled with a novel online debiasing procedure. This allows us to maintain low-dimensional summary statistics while effectively controlling the optimization error introduced by the dynamically changing loss functions. We establish the asymptotic normality of our proposed Adaptive Debiased Lasso (ADL) estimator. We conduct extensive simulation experiments to show the statistical validity and computational efficiency of our ADL estimator across various settings. Its computational efficiency is further demonstrated via a real data application to the spam email classification.
Problem

Research questions and friction points this paper is trying to address.

Adaptive debiased SGD
high-dimensional GLMs
streaming data analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive stochastic gradient descent
Online debiasing procedure
Single-pass mode operation
🔎 Similar Papers
No similar papers found.
R
Ruijian Han
Department of Applied Mathematics, The Hong Kong Polytechnic University
Lan Luo
Lan Luo
Assistant Professor of Biostatistics, Rutgers University
Streaming dataonline statistical inferencemobile healthmediation analysis
Y
Yuanhang Luo
Department of Applied Mathematics, The Hong Kong Polytechnic University
Yuanyuan Lin
Yuanyuan Lin
The Chinese University of Hong Kong
Statistics
J
Jian Huang
Department of Applied Mathematics, The Hong Kong Polytechnic University