Minimum width for universal approximation using squashable activation functions

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This paper investigates the minimal width required for feedforward neural networks with compressible activation functions to achieve universal approximation of functions from $[0,1]^{d_x}$ to $mathbb{R}^{d_y}$ in the $L^p$ sense. Compressibility—defined as the capacity of an activation function to arbitrarily approximate both the identity and the binary step function via affine compositions—unifies two fundamental approximation capabilities. The authors rigorously establish that, for all non-affine analytic functions and a broad class of piecewise functions, the tight lower bound on minimal width is $max{d_x, d_y, 2}$; remarkably, this bound remains sharp even when $d_x = d_y = 1$ and the target function is monotonic. This work constitutes the first systematic generalization of ReLU-type minimal-width results to a wide family of nonlinear activations. It introduces the novel concept of *compressibility*, provides verifiable sufficient conditions for it, and thereby significantly extends both the applicability and structural understanding of universal approximation theory.

Technology Category

Application Category

📝 Abstract

The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $mathbb R^{d_y}$, the minimum width is $max{d_x,d_y,2}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.

Problem

Research questions and friction points this paper is trying to address.

Determines minimum width for universal approximation networks

Focuses on squashable activation functions' width requirements

Extends results to non-affine analytic and piecewise functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimum width for universal approximation networks

Squashable activation functions generalize ReLU

Non-affine analytic functions are squashable

🔎 Similar Papers

Neural Networks Trained by Weight Permutation are Universal Approximators