Minimum width for universal approximation using squashable activation functions

📅 2025-04-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the minimal width required for feedforward neural networks with compressible activation functions to achieve universal approximation of functions from $[0,1]^{d_x}$ to $mathbb{R}^{d_y}$ in the $L^p$ sense. Compressibility—defined as the capacity of an activation function to arbitrarily approximate both the identity and the binary step function via affine compositions—unifies two fundamental approximation capabilities. The authors rigorously establish that, for all non-affine analytic functions and a broad class of piecewise functions, the tight lower bound on minimal width is $max{d_x, d_y, 2}$; remarkably, this bound remains sharp even when $d_x = d_y = 1$ and the target function is monotonic. This work constitutes the first systematic generalization of ReLU-type minimal-width results to a wide family of nonlinear activations. It introduces the novel concept of *compressibility*, provides verifiable sufficient conditions for it, and thereby significantly extends both the applicability and structural understanding of universal approximation theory.

Technology Category

Application Category

📝 Abstract
The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $mathbb R^{d_y}$, the minimum width is $max{d_x,d_y,2}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.
Problem

Research questions and friction points this paper is trying to address.

Determines minimum width for universal approximation networks
Focuses on squashable activation functions' width requirements
Extends results to non-affine analytic and piecewise functions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimum width for universal approximation networks
Squashable activation functions generalize ReLU
Non-affine analytic functions are squashable
🔎 Similar Papers
No similar papers found.
J
Jonghyun Shin
Department of Mathematics Education, Korea University
N
Namjun Kim
Department of Artificial Intelligence, Korea University
Geonho Hwang
Geonho Hwang
Gwangju Institute of Science and Technology
Deep Learning
Sejun Park
Sejun Park
Assistant Professor, Korea University