Is Multi-Distribution Learning as Easy as PAC Learning: Sharp Rates with Bounded Label Noise

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work investigates the statistical efficiency of multi-distribution learning under bounded label noise, examining whether it can achieve the fast convergence rates of single-task PAC learning and clarifying the dependence of sample complexity on the number of distributions $k$. By constructing a structured hypothesis testing framework combined with minimax analysis, the study establishes—for the first time—that unless each distribution is learned independently, multi-distribution learning inevitably incurs a sample complexity of $k/\varepsilon^2$ under bounded noise. Furthermore, it reveals a multiplicative penalty in excess risk relative to the Bayes optimal error that scales with $k$, and rigorously distinguishes the statistical nature of random noise from Massart noise, thereby identifying an inherent learning bottleneck in this setting.

Technology Category

Application Category

📝 Abstract

Towards understanding the statistical complexity of learning from heterogeneous sources, we study the problem of multi-distribution learning. Given $k$ data sources, the goal is to output a classifier for each source by exploiting shared structure to reduce sample complexity. We focus on the bounded label noise setting to determine whether the fast $1/\epsilon$ rates achievable in single-task learning extend to this regime with minimal dependence on $k$. Surprisingly, we show that this is not the case. We demonstrate that learning across $k$ distributions inherently incurs slow rates scaling with $k/\epsilon^2$, even under constant noise levels, unless each distribution is learned separately. A key technical contribution is a structured hypothesis-testing framework that captures the statistical cost of certifying near-optimality under bounded noise-a cost we show is unavoidable in the multi-distribution setting. Finally, we prove that when competing with the stronger benchmark of each distribution's optimal Bayes error, the sample complexity incurs a \textit{multiplicative} penalty in $k$. This establishes a \textit{statistical} separation between random classification noise and Massart noise, highlighting a fundamental barrier unique to learning from multiple sources.

Problem

Research questions and friction points this paper is trying to address.

multi-distribution learning

statistical complexity

bounded label noise

sample complexity

PAC learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-distribution learning

bounded label noise

sample complexity