Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Ability

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The strong generalization capability of over-parameterized deep neural networks under empirical risk minimization remains theoretically puzzling, as classical generalization analyses—relying on restrictive assumptions such as bounded VC dimension or algorithmic stability—fail to explain this phenomenon. Method: We propose a novel paradigm that directly links generalization performance to the network’s *expressive capacity*, rather than conventional complexity measures. Through rigorous theoretical analysis and probabilistic generalization bound derivation, we establish the first expressive-capacity-dependent lower bound on generalization error. Contribution/Results: We prove a necessary condition: the sample size must exceed the network’s representational capacity for nontrivial generalization. Our framework quantitatively characterizes the data-model scale matching principle, unifying explanations for robust generalization, the necessity of over-parameterization, and the role of loss functions in generalization. It provides a more general, assumption-light theoretical foundation for core deep learning phenomena.

Technology Category

Application Category

📝 Abstract
The primary objective of learning methods is generalization. Classic uniform generalization bounds, which rely on VC-dimension or Rademacher complexity, fail to explain the significant attribute that over-parameterized models in deep learning exhibit nice generalizability. On the other hand, algorithm-dependent generalization bounds, like stability bounds, often rely on strict assumptions. To establish generalizability under less stringent assumptions, this paper investigates the generalizability of neural networks that minimize or approximately minimize empirical risk. We establish a lower bound for population accuracy based on the expressiveness of these networks, which indicates that with an adequate large number of training samples and network sizes, these networks, including over-parameterized ones, can generalize effectively. Additionally, we provide a necessary condition for generalization, demonstrating that, for certain data distributions, the quantity of training data required to ensure generalization exceeds the network size needed to represent the corresponding data distribution. Finally, we provide theoretical insights into several phenomena in deep learning, including robust generalization, importance of over-parameterization, and effect of loss function on generalization.
Problem

Research questions and friction points this paper is trying to address.

Explains generalizability of over-parameterized neural networks.
Establishes lower bound for population accuracy based on network expressiveness.
Provides necessary conditions for generalization in deep learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Establishes lower bound for population accuracy
Investigates neural networks minimizing empirical risk
Provides necessary condition for generalization
🔎 Similar Papers
No similar papers found.
L
Lijia Yu
Key Laboratory of System Software of Chinese Academy of Sciences, Institute of Software, Chinese Academy of Sciences
Yibo Miao
Yibo Miao
Shanghai Jiao Tong University; Moonshot
Deep LearningNatural Language ProcessingLarge Language Models
Yifan Zhu
Yifan Zhu
Beijing University of Posts and Telecommunications
PEFT of LLMsGraph RAGGraph mining
Xiao-Shan Gao
Xiao-Shan Gao
AMSS, CAS
Automated ReasoningSymbolic ComputationMachine Learning Theory
L
Lijun Zhang
Key Laboratory of System Software of Chinese Academy of Sciences, Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Institute of AI for Industries, Chinese Academy of Sciences