🤖 AI Summary
Traditional empirical risk minimization (ERM) suffers from poor generalization in over-parameterized regimes, struggling to balance fitting accuracy and robustness. Method: We propose functional risk minimization (FRM), a unified framework that defines risk over function space—not pointwise predictions—treating function approximation as the fundamental optimization unit. FRM constructs structured risk via parameterized function families $f_{ heta_i}$ and incorporates parsimony regularization, recovering ERM as a special case while more faithfully modeling noise processes. Contribution/Results: FRM provides a “minimal-function-fitting” interpretation of generalization for over-parameterized models. We prove its theoretical generality—encompassing major loss functions—and demonstrate empirically that FRM consistently outperforms ERM baselines across supervised, unsupervised, and reinforcement learning tasks, with particularly pronounced gains in over-parameterized settings.
📝 Abstract
The field of Machine Learning has changed significantly since the 1970s. However, its most basic principle, Empirical Risk Minimization (ERM), remains unchanged. We propose Functional Risk Minimization~(FRM), a general framework where losses compare functions rather than outputs. This results in better performance in supervised, unsupervised, and RL experiments. In the FRM paradigm, for each data point $(x_i,y_i)$ there is function $f_{ heta_i}$ that fits it: $y_i = f_{ heta_i}(x_i)$. This allows FRM to subsume ERM for many common loss functions and to capture more realistic noise processes. We also show that FRM provides an avenue towards understanding generalization in the modern over-parameterized regime, as its objective can be framed as finding the simplest model that fits the training data.