🤖 AI Summary
Traditional PAC learning theory and static Empirical Risk Minimization (ERM) are fundamentally inadequate for dynamic learning settings where data distributions and task objectives evolve over time.
Method: We propose a prospective learning theoretical framework that explicitly incorporates time as an input variable, enabling the construction of sequential predictors. We introduce the first theoretically grounded learning model tailored to time-varying environments—Prospective ERM—and prove that its empirical risk converges to the time-varying Bayesian optimal solution.
Contribution/Results: Our analysis formally establishes the inherent failure of static ERM under distributional drift. Leveraging time-varying stochastic process modeling and generalization bounds, experiments on synthetic benchmarks and temporally extended MNIST/CIFAR-10 tasks demonstrate substantial improvements over standard ERM. The framework provides a novel paradigm for robust learning in non-stationary environments, bridging theoretical rigor with empirical efficacy.
📝 Abstract
In real-world applications, the distribution of the data, and our goals, evolve over time. The prevailing theoretical framework for studying machine learning, namely probably approximately correct (PAC) learning, largely ignores time. As a consequence, existing strategies to address the dynamic nature of data and goals exhibit poor real-world performance. This paper develops a theoretical framework called"Prospective Learning"that is tailored for situations when the optimal hypothesis changes over time. In PAC learning, empirical risk minimization (ERM) is known to be consistent. We develop a learner called Prospective ERM, which returns a sequence of predictors that make predictions on future data. We prove that the risk of prospective ERM converges to the Bayes risk under certain assumptions on the stochastic process generating the data. Prospective ERM, roughly speaking, incorporates time as an input in addition to the data. We show that standard ERM as done in PAC learning, without incorporating time, can result in failure to learn when distributions are dynamic. Numerical experiments illustrate that prospective ERM can learn synthetic and visual recognition problems constructed from MNIST and CIFAR-10. Code at https://github.com/neurodata/prolearn.