Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the adaptive approximation and learning capabilities of constant-depth neural networks with smooth activation functions for high-order smooth target functions. By constructing an explicit network architecture and analyzing approximation and estimation errors in Sobolev spaces, the authors demonstrate that the smoothness of the activation function alone enables constant-depth networks to automatically adapt to arbitrary levels of target smoothness, achieving minimax-optimal rates—up to logarithmic factors—in both approximation and statistical estimation errors. The study reveals that smooth activations can serve as a viable alternative to increasing network depth, overcoming the approximation-order bottleneck inherent in ReLU-type networks due to their non-smoothness. Notably, this result requires no sparsity assumptions and simultaneously ensures parameter controllability and statistical learnability.

Technology Category

Application Category

📝 Abstract
Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we characterize both approximation and statistical properties of neural networks with smooth activations over the Sobolev space $W^{s,\infty}([0,1]^d)$ for arbitrary smoothness $s>0$. We prove that constant-depth networks equipped with smooth activations automatically exploit arbitrarily high orders of target function smoothness, achieving the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In sharp contrast, networks with non-smooth activations, such as ReLU, lack this adaptivity: their attainable approximation order is strictly limited by depth, and capturing higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, alternative to depth, for attaining statistical optimality. Technically, our results are established via a constructive approximation framework that produces explicit neural network approximators with carefully controlled parameter norms and model size. This complexity control ensures statistical learnability under empirical risk minimization (ERM) and removes the impractical sparsity constraints commonly required in prior analyses.
Problem

Research questions and friction points this paper is trying to address.

smooth activation
constant-depth neural networks
Sobolev space
approximation error
statistical optimality
Innovation

Methods, ideas, or system contributions that make the work stand out.

smooth activations
constant-depth networks
smoothness adaptivity
minimax optimality
constructive approximation
🔎 Similar Papers
No similar papers found.
Y
Yuhao Liu
Department of Mathematical Sciences, Tsinghua University
Zilin Wang
Zilin Wang
University of Oxford
Deep Reinforcement LearningAutonomous Driving
L
Lei Wu
School of Mathematical Sciences, Peking University; Center for Machine Learning Research, Peking University; AI for Science Institute, Beijing
S
Shaobo Zhang
School of Mathematical Sciences, Peking University