Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the challenge of learning high-dimensional sparse composite functions in over-parameterized settings by proposing a deep neural network approach constrained by the Frobenius norm. The method leverages directed acyclic graphs to model the hierarchical sparsity structure inherent in the target function. Within a parameter-norm complexity framework, the study establishes the first learning theory that encompasses a broad class of architectures—including multi-index models and binary trees—and derives approximation rates and excess risk bounds that depend solely on the compositional structure of the function rather than the number of parameters. This analysis reveals how deep networks circumvent the curse of dimensionality by exploiting the intrinsic hierarchical sparsity of the target function, thereby providing theoretical foundations for efficient approximation and generalization in high-dimensional function learning.

📝 Abstract

The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates based on parameter counts and sample complexity guarantees for compositional models without incurring the curse of dimensionality (CoD). To study overparameterized regimes, where the number of parameters exceeds the sample size, we develop a framework that measures complexity via the parameter norm. Within this approach, we establish approximation rates and excess risk bounds for learning sparse compositional functions whose compositional structure is represented by directed acyclic graphs (DAGs), using Frobenius norm-constrained deep neural networks. Our results have broad applicability since every function that is efficiently Turing computable admits sparse compositional representations. In particular, we cover a range of representative models, including multi-index models, binary tree structures, and general compositional architectures. The rates we derive show that deep networks can exploit the compositional structure of the target functions, effectively avoiding the CoD through hierarchical representations.

Problem

Research questions and friction points this paper is trying to address.

sparse compositional functions

curse of dimensionality

overparameterized regimes

hierarchical representations

directed acyclic graphs

Innovation

Methods, ideas, or system contributions that make the work stand out.

norm-constrained neural networks

sparse compositional functions

curse of dimensionality