Computational-Statistical Tradeoffs from NP-hardness

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This paper investigates the fundamental trade-off between computational efficiency and statistical sample complexity in PAC learning: under standard worst-case complexity assumptions, does efficient computability necessarily entail higher sample requirements? Breaking through the theoretical barriers established by Applebaum et al., it provides the first computational–statistical trade-off for improper learning based on NP-hardness. The approach integrates average-case complexity analysis, VC-dimension theory, and NP-hardness reductions to construct a learning model over subclasses of polynomial-size circuits. Key results show that, if NP requires exponential time, there exist function classes with VC-dimension one for which every efficient algorithm requires arbitrarily large sample complexity. Furthermore, it proves that all NP-enumerable function classes are efficiently learnable if and only if RP = NP. This work thus establishes, for the first time, a classical complexity-theoretic conjecture—RP = NP—as a precise feasibility criterion for efficient learning.

Technology Category

Application Category

📝 Abstract

A central question in computer science and statistics is whether efficient algorithms can achieve the information-theoretic limits of statistical problems. Many computational-statistical tradeoffs have been shown under average-case assumptions, but since statistical problems are average-case in nature, it has been a challenge to base them on standard worst-case assumptions. In PAC learning where such tradeoffs were first studied, the question is whether computational efficiency can come at the cost of using more samples than information-theoretically necessary. We base such tradeoffs on $mathsf{NP}$-hardness and obtain: $circ$ Sharp computational-statistical tradeoffs assuming $mathsf{NP}$ requires exponential time: For every polynomial $p(n)$, there is an $n$-variate class $C$ with VC dimension $1$ such that the sample complexity of time-efficiently learning $C$ is $Θ(p(n))$. $circ$ A characterization of $mathsf{RP}$ vs. $mathsf{NP}$ in terms of learning: $mathsf{RP} = mathsf{NP}$ iff every $mathsf{NP}$-enumerable class is learnable with $O(mathrm{VCdim}(C))$ samples in polynomial time. The forward implication has been known since (Pitt and Valiant, 1988); we prove the reverse implication. Notably, all our lower bounds hold against improper learners. These are the first $mathsf{NP}$-hardness results for improperly learning a subclass of polynomial-size circuits, circumventing formal barriers of Applebaum, Barak, and Xiao (2008).

Problem

Research questions and friction points this paper is trying to address.

Exploring computational-statistical tradeoffs in learning algorithms

Basing tradeoffs on NP-hardness for efficient learning

Characterizing RP vs NP via sample complexity in learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

NP-hardness based computational-statistical tradeoffs

Characterizing RP vs NP via learning complexity

Circumventing barriers for improper learning lower bounds

🔎 Similar Papers

No similar papers found.