Geometry and Stability of Supervised Learning Problems

📅 2024-03-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Quantifying disparities among supervised learning problems remains challenging due to the lack of a principled metric that jointly accounts for sampling bias, label noise, data scarcity, and model approximation. Method: We propose the Risk distance—a geometric measure grounded in optimal transport—that unifies the impact of these factors on problem stability. We construct the first metric space over supervised learning problems, derive explicit geodesic expressions, and prove density of classification problems within the generalized problem space. Two variants—prediction-variable-weighted Risk distance and risk-surface-sensitive Risk distance—are introduced to enhance practical modeling fidelity. Contribution/Results: This framework integrates optimal transport theory, risk minimization analysis, and differential geometry to establish theoretical guarantees for problem stability and uncover intrinsic geometric structures of the problem space. It provides a novel paradigm and foundational theory for robust and transfer learning.

Technology Category

Application Category

📝 Abstract

We introduce a notion of distance between supervised learning problems, which we call the Risk distance. This optimal-transport-inspired distance facilitates stability results; one can quantify how seriously issues like sampling bias, noise, limited data, and approximations might change a given problem by bounding how much these modifications can move the problem under the Risk distance. With the distance established, we explore the geometry of the resulting space of supervised learning problems, providing explicit geodesics and proving that the set of classification problems is dense in a larger class of problems. We also provide two variants of the Risk distance: one that incorporates specified weights on a problem's predictors, and one that is more sensitive to the contours of a problem's risk landscape.

Problem

Research questions and friction points this paper is trying to address.

Defining a distance metric for supervised learning problems

Quantifying stability against sampling bias and noise

Exploring the geometry of the problem space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal transport-inspired Risk distance metric

Explicit geodesics for problem space geometry

Weighted and contour-sensitive Risk distance variants

🔎 Similar Papers

Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning