🤖 AI Summary
This study investigates the fundamental trade-off between depth and width in ReLU neural networks required to exactly represent the d-dimensional max function. By linking the function’s nonsmooth structure to the clique structure of a graph induced by the first hidden layer, and leveraging tools from extremal graph theory—particularly Turán’s theorem—alongside combinatorial arguments and analysis of ReLU ridge structures, the work establishes the first unconditional superlinear width lower bound for networks of depth \(k \geq 3\), where \(k\) may grow with \(d\). Specifically, for any \(3 \leq k \leq \log_2(\log_2(d))\), the network width must be at least \(\Omega\left(d^{1 + 1/(2^{k-2} - 1)}\right)\). This result reveals the intrinsic geometric complexity of the max function and confirms the existence of a depth hierarchy in expressive power.
📝 Abstract
We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width $\Omega\big(d^{1+\frac{1}{2^{k-2}-1}}\big)$ is necessary to represent the maximum for any depth $3\le k\le \log_2(\log_2(d))$. This is the first unconditional super-linear lower bound for this fundamental operator at depths $k\ge3$, and it holds even if the depth scales with $d$. Our proof technique is based on a combinatorial argument and associates the non-differentiable ridges of the maximum with cliques in a graph induced by the first hidden layer of the computing network, utilizing Tur\'an's theorem from extremal graph theory to show that a sufficiently narrow network cannot capture the non-linearities of the maximum. This suggests that despite its simple nature, the maximum function possesses an inherent complexity that stems from the geometric structure of its non-differentiable hyperplanes, and provides a novel approach for proving lower bounds for deep neural networks.