Beyond the Loss Curve: Scaling Laws, Active Learning, and the Limits of Learning from Exact Posteriors

πŸ“… 2026-01-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of accurately evaluating the gap between neural network performance and theoretical optimality on real-world image tasks, a limitation in existing benchmarks due to the absence of tractable true posterior distributions. For the first time, the authors construct β€œoracle” models based on class-conditional normalizing flows on real datasets such as AFHQ and ImageNet, enabling exact and computable posteriors. Leveraging this framework, they systematically analyze learning limits, scaling laws, soft-label training, distribution shifts, and active learning. Key findings include a power-law decay of epistemic error with dataset size, superior performance of ResNets over Vision Transformers in low-data regimes, significant improvements in calibration and accuracy from soft labels, greater sensitivity to the type rather than magnitude of distribution shift, and markedly enhanced active learning efficiency when guided by epistemic uncertainty. This study advances the paradigm for model evaluation in deep learning.

Technology Category

Application Category

πŸ“ Abstract
How close are neural networks to the best they could possibly do? Standard benchmarks cannot answer this because they lack access to the true posterior p(y|x). We use class-conditional normalizing flows as oracles that make exact posteriors tractable on realistic images (AFHQ, ImageNet). This enables five lines of investigation. Scaling laws: Prediction error decomposes into irreducible aleatoric uncertainty and reducible epistemic error; the epistemic component follows a power law in dataset size, continuing to shrink even when total loss plateaus. Limits of learning: The aleatoric floor is exactly measurable, and architectures differ markedly in how they approach it: ResNets exhibit clean power-law scaling while Vision Transformers stall in low-data regimes. Soft labels: Oracle posteriors contain learnable structure beyond class labels: training with exact posteriors outperforms hard labels and yields near-perfect calibration. Distribution shift: The oracle computes exact KL divergence of controlled perturbations, revealing that shift type matters more than shift magnitude: class imbalance barely affects accuracy at divergence values where input noise causes catastrophic degradation. Active learning: Exact epistemic uncertainty distinguishes genuinely informative samples from inherently ambiguous ones, improving sample efficiency. Our framework reveals that standard metrics hide ongoing learning, mask architectural differences, and cannot diagnose the nature of distribution shift.
Problem

Research questions and friction points this paper is trying to address.

posterior estimation
scaling laws
distribution shift
active learning
neural network limits
Innovation

Methods, ideas, or system contributions that make the work stand out.

exact posterior
scaling laws
aleatoric uncertainty
active learning
distribution shift
πŸ”Ž Similar Papers
No similar papers found.
A
Arian Khorasani
Mila-Quebec AI Institute, Canada
N
Nathaniel Chen
Plasma Physics Lab, Princeton University, USA
Y
Yug D Oswal
School of Computer Science and Engineering, Vellore Institute of Technology, India
A
Akshat Santhana Gopalan
High School Student, John P. Stevens High School, USA
Egemen Kolemen
Egemen Kolemen
Princeton University
Plasma Control
Ravid Shwartz-Ziv
Ravid Shwartz-Ziv
New York University
machine learningdeep learningrepresentation learning theoryneuroscience