🤖 AI Summary
This paper investigates the sample complexity of testing $k$-junta functions in the distribution-free model, with emphasis on tight lower bounds for tolerant testing and their implications for feature selection. Using combinatorial analysis, information-theoretic lower bound constructions, and adaptive variable importance estimation, the authors establish the first tight sample complexity bound for a natural class of Boolean functions under the distribution-free model: $Thetaig(frac{1}{varepsilon}ig(sqrt{2^k log inom{n}{k}} + log inom{n}{k}ig)ig)$. Moreover, they prove a strong lower bound of $Omegaig(2^{(1-o(1))k} + log inom{n}{k}ig)$ for tolerant $k$-junta testing. A key conceptual contribution is the revelation that, in terms of sample complexity, there is no asymptotic separation between tolerant testing and learning—resolving a long-standing question. This yields the first precise theoretical characterization of the fundamental limits of high-dimensional feature screening in the distribution-free setting.
📝 Abstract
We prove tight upper and lower bounds of $Thetaleft( frac{1}{epsilon}left( sqrt{2^k loginom{n}{k} } + loginom{n}{k}
ight)
ight)$ on the number of samples required for distribution-free $k$-junta testing. This is the first tight bound for testing a natural class of Boolean functions in the distribution-free sample-based model. Our bounds also hold for the feature selection problem, showing that a junta tester must learn the set of relevant variables. For tolerant junta testing, we prove a sample lower bound of $Omega(2^{(1-o(1)) k} + loginom{n}{k})$ showing that, unlike standard testing, there is no large gap between tolerant testing and learning.