🤖 AI Summary
This work addresses the challenges posed by heteroscedasticity and non-stationary covariate effects in streaming data, which complicate modeling and are often inadequately handled by existing online quantile regression methods due to their high computational and memory demands. The authors propose a novel online renewable estimation approach based on a smoothed expected quantile loss, introducing it for the first time into the online renewable learning framework. By efficiently integrating incoming observations with summary statistics from historical data, the method enables scalable model updates. Theoretical analysis establishes that the resulting estimator is consistent and asymptotically normal, achieving statistical efficiency comparable to that of the oracle estimator using the full dataset. Empirical experiments demonstrate that the proposed method substantially reduces computational and storage costs while maintaining excellent estimation accuracy.
📝 Abstract
Streaming data often exhibit heterogeneity due to heteroscedastic variances or inhomogeneous covariate effects. Online renewable quantile and expectile regression methods provide valuable tools for detecting such heteroscedasticity by combining current data with summary statistics from historical data. However, quantile regression can be computationally demanding because of the non-smooth check function. To address this, we propose a novel online renewable method based on expectile regression, which efficiently updates estimates using both current observations and historical summaries, thereby reducing storage requirements. By exploiting the smoothness of the expectile loss function, our approach achieves superior computational efficiency compared with existing online renewable methods for streaming data with heteroscedastic variances or inhomogeneous covariate effects. We establish the consistency and asymptotic normality of the proposed estimator under mild regularity conditions, demonstrating that it achieves the same statistical efficiency as oracle estimators based on full individual-level data. Numerical experiments and real-data applications demonstrate that our method performs comparably to the oracle estimator while maintaining high computational efficiency and minimal storage costs.