SUPN: Shallow Universal Polynomial Networks

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the challenges of large parameter counts, training instability, and initialization-sensitive generalization in deep neural networks (DNNs) and Kolmogorov–Arnold networks (KANs), this paper proposes the Shallow Universal Polynomial Network (SUPN): a shallow architecture retaining nonlinearity only in the output layer, while all hidden layers are replaced by learnable polynomial layers with trainable coefficients. We theoretically establish that SUPN’s convergence rate matches that of the optimal-degree polynomial approximation, and derive an explicit quasi-optimal parameter formula. Extensive experiments across over 13,000 models demonstrate that, at equal parameter count, SUPN reduces approximation error and variance by an order of magnitude compared to DNNs and KANs. Moreover, it significantly outperforms classical polynomial projection on nonsmooth functions, achieving a favorable trade-off among high expressivity, low computational complexity, and training stability.

Technology Category

Application Category

📝 Abstract

Deep neural networks (DNNs) and Kolmogorov-Arnold networks (KANs) are popular methods for function approximation due to their flexibility and expressivity. However, they typically require a large number of trainable parameters to produce a suitable approximation. Beyond making the resulting network less transparent, overparameterization creates a large optimization space, likely producing local minima in training that have quite different generalization errors. In this case, network initialization can have an outsize impact on the model's out-of-sample accuracy. For these reasons, we propose shallow universal polynomial networks (SUPNs). These networks replace all but the last hidden layer with a single layer of polynomials with learnable coefficients, leveraging the strengths of DNNs and polynomials to achieve sufficient expressivity with far fewer parameters. We prove that SUPNs converge at the same rate as the best polynomial approximation of the same degree, and we derive explicit formulas for quasi-optimal SUPN parameters. We complement theory with an extensive suite of numerical experiments involving SUPNs, DNNs, KANs, and polynomial projection in one, two, and ten dimensions, consisting of over 13,000 trained models. On the target functions we numerically studied, for a given number of trainable parameters, the approximation error and variability are often lower for SUPNs than for DNNs and KANs by an order of magnitude. In our examples, SUPNs even outperform polynomial projection on non-smooth functions.

Problem

Research questions and friction points this paper is trying to address.

Reducing trainable parameters in neural networks

Addressing overparameterization and local minima issues

Improving approximation accuracy with polynomial networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shallow universal polynomial networks replace hidden layers

They use learnable polynomial coefficients for expressivity

Achieve fewer parameters with same convergence rate

🔎 Similar Papers

A Survey on State-of-the-art Deep Learning Applications and Challenges