SUPN: Shallow Universal Polynomial Networks

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of large parameter counts, training instability, and initialization-sensitive generalization in deep neural networks (DNNs) and Kolmogorov–Arnold networks (KANs), this paper proposes the Shallow Universal Polynomial Network (SUPN): a shallow architecture retaining nonlinearity only in the output layer, while all hidden layers are replaced by learnable polynomial layers with trainable coefficients. We theoretically establish that SUPN’s convergence rate matches that of the optimal-degree polynomial approximation, and derive an explicit quasi-optimal parameter formula. Extensive experiments across over 13,000 models demonstrate that, at equal parameter count, SUPN reduces approximation error and variance by an order of magnitude compared to DNNs and KANs. Moreover, it significantly outperforms classical polynomial projection on nonsmooth functions, achieving a favorable trade-off among high expressivity, low computational complexity, and training stability.

Technology Category

Application Category

📝 Abstract
Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a large number of trainable parameters to produce a suitable approximation. Beyond making the resulting network less transparent, overparameterization creates a large optimization space, likely producing local minima in training that have quite different generalization errors. In this case, network initialization can have an outsize impact on the model's out-of-sample accuracy. For these reasons, we propose shallow universal polynomial networks (SUPNs). These networks replace all but the last hidden layer with a single layer of polynomials with learnable coefficients, leveraging the strengths of DNNs and polynomials to achieve sufficient expressivity with far fewer parameters. We prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree, and we derive explicit formulas for quasi-optimal SUPN parameters. We complement theory with an extensive suite of numerical experiments involving SUPNs, DNNs, KANs, and polynomial projection in one, two, and ten dimensions, consisting of over 13,000 trained models. On the target functions we numerically studied, for a given number of trainable parameters, the approximation error and variability are often lower for SUPNs than for DNNs and KANs by an order of magnitude. In our examples, SUPNs even outperform polynomial projection on non-smooth functions.
Problem

Research questions and friction points this paper is trying to address.

Reducing trainable parameters in neural networks
Addressing overparameterization and local minima issues
Improving approximation accuracy with polynomial networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shallow universal polynomial networks replace hidden layers
They use learnable polynomial coefficients for expressivity
Achieve fewer parameters with same convergence rate
Z
Zachary Morrow
Sandia National Laboratories, 1515 Eubank Blvd SE, Albuquerque, NM 87123
Michael Penwarden
Michael Penwarden
Senior Member of Technical Staff at Sandia National Laboratories
SciML | PINNs | Neural Operators
Brian Chen
Brian Chen
Google DeepMind; Samsung Research America; Columbia University
Computer visionVision and LanguageMultimodal Learning
A
Aurya Javeed
Sandia National Laboratories, 1515 Eubank Blvd SE, Albuquerque, NM 87123
Akil Narayan
Akil Narayan
University of Utah
Scientific computingnumerical analysisuncertainty quantification
J
John D. Jakeman
Sandia National Laboratories, 1515 Eubank Blvd SE, Albuquerque, NM 87123