🤖 AI Summary
To address degraded model generalization under temporal distribution shift in dynamic environments, this paper proposes RIDER—a method grounded in the theory of random distribution shifts. RIDER is the first to unify common weighting strategies within a single theoretical framework and rigorously derive the optimal empirical risk minimization weights. It subsumes classical approaches—including importance weighting and time-decay weighting—as special cases, ensuring both theoretical soundness and practical scalability. By integrating statistical learning theory for non-stationary settings into parameter optimization, RIDER achieves significant performance gains over state-of-the-art weighted baselines across three real-world tasks: Yearbook image classification, stock volatility forecasting, and New York City taxi trip duration prediction. Empirical results demonstrate RIDER’s effectiveness in enhancing robust generalization under non-stationary data distributions.
📝 Abstract
Temporal distribution shifts pose a key challenge for machine learning models trained and deployed in dynamically evolving environments. This paper introduces RIDER (RIsk minimization under Dynamically Evolving Regimes) which derives optimally-weighted empirical risk minimization procedures under temporal distribution shifts. Our approach is theoretically grounded in the random distribution shift model, where random shifts arise as a superposition of numerous unpredictable changes in the data-generating process. We show that common weighting schemes, such as pooling all data, exponentially weighting data, and using only the most recent data, emerge naturally as special cases in our framework. We demonstrate that RIDER consistently improves out-of-sample predictive performance when applied as a fine-tuning step on the Yearbook dataset, across a range of benchmark methods in Wild-Time. Moreover, we show that RIDER outperforms standard weighting strategies in two other real-world tasks: predicting stock market volatility and forecasting ride durations in NYC taxi data.