🤖 AI Summary
This work investigates the optimal $L^p$-approximation capability of deep ReLU networks under joint constraints on width $W$ and depth $L$, for functions in Sobolev spaces $W^{s,q}([0,1]^d)$ and Besov spaces $B^s_{q,r}([0,1]^d)$. We propose a novel sparse vector encoding scheme based on variable-width–depth architectures, integrating tools from approximation theory and function space embeddings. Under generalized Sobolev embedding conditions, we establish— for the first time—the tight convergence rate $O((WL)^{-2s/d})$, up to logarithmic factors, thereby achieving the theoretical optimum. Our analysis unifies and extends prior bounds derived under either fixed-width or fixed-depth assumptions, yielding the sharpest known characterization of expressive power for deep neural networks in terms of the joint $(W,L)$-scaling.
📝 Abstract
This paper studies the problem of how efficiently functions in the Sobolev spaces $mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$, when the error is measured in the $L^p([0,1]^d)$ norm. This problem has been studied by several recent works, which obtained the approximation rate $mathcal{O}((WL)^{-2s/d})$ up to logarithmic factors when $p=q=infty$, and the rate $mathcal{O}(L^{-2s/d})$ for networks with fixed width when the Sobolev embedding condition $1/q -1/p<s/d$ holds. We generalize these results by showing that the rate $mathcal{O}((WL)^{-2s/d})$ indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.