Feasible Learning

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional empirical risk minimization (ERM) overlooks per-sample performance, leading to poor tail robustness. To address this, we propose a sample-centric learning paradigm that formulates training as a global feasibility optimization problem, explicitly constraining the loss of each training sample to remain below an adaptive threshold. We introduce a novel minimum-norm relaxation mechanism to automatically determine principled, instance-aware thresholds, and design a dynamic sample reweighting primal-dual algorithm for efficient optimization. Our method preserves average performance nearly unchanged while substantially improving tail robustness across diverse tasks—including image classification, age regression, and preference optimization in large language models. It establishes a new paradigm for fairness, reliability, and long-tail generalization by prioritizing uniform performance across the data distribution rather than solely minimizing aggregate loss.

Technology Category

Application Category

📝 Abstract
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample. In contrast to the ubiquitous Empirical Risk Minimization (ERM) framework, which optimizes for average performance, FL demands satisfactory performance on every individual data point. Since any model that meets the prescribed performance threshold is a valid FL solution, the choice of optimization algorithm and its dynamics play a crucial role in shaping the properties of the resulting solutions. In particular, we study a primal-dual approach which dynamically re-weights the importance of each sample during training. To address the challenge of setting a meaningful threshold in practice, we introduce a relaxation of FL that incorporates slack variables of minimal norm. Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
Problem

Research questions and friction points this paper is trying to address.

Robust Learning
Uniform Error Bound
Extreme Case Handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feasible Learning
Dynamic Adjustment
Robustness to Outliers