Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

225K/year
πŸ€– AI Summary
This work addresses the limited expressivity of existing low-rank neural networks, which fail to achieve universal approximation due to the absence of dense priors. The authors propose a Diagonal plus Low-Rank (DLoR) architecture that restores universal approximation capability by combining a minimally sparse diagonal component with a low-rank termβ€”without requiring dense matrices or activation functions with specific properties. Theoretical analysis demonstrates that DLoR networks can exactly reconstruct any full-rank linear transformation under general activation functions, thereby satisfying the universal approximation theorem. Moreover, depth is shown to be more effective than width in enhancing representational efficiency. This study overcomes the expressivity bottleneck of current low-rank approaches and establishes, for the first time, that a diagonal-plus-low-rank structure alone can simultaneously achieve parameter efficiency and universal approximation.
πŸ“ Abstract
The massive computational costs of scaling modern deep learning architectures have driven the widespread use of parameter-efficient low-rank structures, such as LoRA and low-rank factorization. However, theoretical guarantees for their expressive power are less explored, often relying on restrictive priors like a pretrained base matrix, ReLU activations or non-verifiable singularity conditions. We first investigate the limits of neural networks constrained strictly to low-rank manifolds without pretrained dense priors. We demonstrate a theoretical paradox: while purely rank-1 layers can exactly interpolate arbitrary scalar datasets, they collapse for function approximations. To overcome this bottleneck without surrendering parameter efficiency, we introduce a unified \textit{Structural Correspondence} framework. We prove that augmenting low-rank layers with only a minimal sparse diagonal component, say a Diagonal plus Low-Rank (DLoR) structure, is sufficient to reach Universal Approximation. We show that any full-rank transformation can be exactly reconstructed using these DLoR components by trading off network width (additive decomposition) or depth (multiplicative decomposition). By tracking asymptotic Taylor remainders, we prove that DLoR neural networks fully restore the Universal Approximation Theorem for general activation functions. Finally, we establish that multiplicative depth provides superior parameter-to-expressivity scaling compared to additive width. Our results show that dense matrices and specific activation functions are not topological prerequisites for universal expressivity.
Problem

Research questions and friction points this paper is trying to address.

Low-Rank Neural Networks
Universal Approximation
Parameter Efficiency
Function Approximation
Expressive Power
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structural Correspondence
Diagonal plus Low-Rank
Universal Approximation
Low-Rank Neural Networks
Parameter Efficiency