Testing Juntas Optimally with Samples

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This paper investigates the sample complexity of testing $k$-junta functions in the distribution-free model, with emphasis on tight lower bounds for tolerant testing and their implications for feature selection. Using combinatorial analysis, information-theoretic lower bound constructions, and adaptive variable importance estimation, the authors establish the first tight sample complexity bound for a natural class of Boolean functions under the distribution-free model: $Thetaig(frac{1}{varepsilon}ig(sqrt{2^k log inom{n}{k}} + log inom{n}{k}ig)ig)$. Moreover, they prove a strong lower bound of $Omegaig(2^{(1-o(1))k} + log inom{n}{k}ig)$ for tolerant $k$-junta testing. A key conceptual contribution is the revelation that, in terms of sample complexity, there is no asymptotic separation between tolerant testing and learning—resolving a long-standing question. This yields the first precise theoretical characterization of the fundamental limits of high-dimensional feature screening in the distribution-free setting.

Technology Category

Application Category

📝 Abstract

We prove tight upper and lower bounds of $Thetaleft( frac{1}{epsilon}left( sqrt{2^k loginom{n}{k} } + loginom{n}{k} ight) ight)$ on the number of samples required for distribution-free $k$-junta testing. This is the first tight bound for testing a natural class of Boolean functions in the distribution-free sample-based model. Our bounds also hold for the feature selection problem, showing that a junta tester must learn the set of relevant variables. For tolerant junta testing, we prove a sample lower bound of $Omega(2^{(1-o(1)) k} + loginom{n}{k})$ showing that, unlike standard testing, there is no large gap between tolerant testing and learning.

Problem

Research questions and friction points this paper is trying to address.

Determining optimal sample bounds for k-junta testing

Establishing connection between junta testing and feature selection

Proving lower bounds for tolerant junta testing complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal sample bounds for junta testing

First tight bound for Boolean functions

Tolerant testing requires exponential samples

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation