🤖 AI Summary
This work bridges the gap between hierarchical function modeling and gradient-descent optimization analysis in deep learning theory. We study hierarchical functions—constructed via repeated composition of nonlinear functions—defined over general independent product measures. First, we extend Boolean function noise sensitivity theory from i.i.d. Bernoulli inputs to arbitrary product measures, establishing their inherent high noise sensitivity. Using Fourier analysis and probabilistic methods, we rigorously derive Ω(1/ε²) lower bounds on either sample complexity or network width under broad conditions. Our key contribution is a universal connection between noise sensitivity and learnability lower bounds for deep networks, yielding the first theoretical framework for deriving hardness results under non-uniform and asymmetric input distributions. This provides foundational insights into the intrinsic limitations of deep learning in realistic data settings.
📝 Abstract
Recent works explore deep learning's success by examining functions or data with hierarchical structure. Complementarily, research on gradient descent performance for deep nets has shown that noise sensitivity of functions under independent and identically distributed (i.i.d.) Bernoulli inputs establishes learning complexity bounds. This paper aims to bridge these research streams by demonstrating that functions constructed through repeated composition of non-linear functions are noise sensitive under general product measures.