🤖 AI Summary
Deep fully connected neural networks (DNNs) lack rigorous mathematical modeling and convergence analysis in the infinite-depth limit.
Method: This paper introduces the Dense Nonlocal (DNL) framework, the first to model dense layers as systems of nonlinear integral equations and characterize training dynamics from an optimal control perspective. It integrates optimal control theory, piecewise-linear extensions, and Γ-convergence analysis.
Contributions/Results: We rigorously prove: (1) the empirical risk optimum converges as depth tends to infinity; and (2) a subsequence of corresponding minimizers converges weakly to a solution of the continuous-time optimal control problem. The DNL framework reveals the intrinsic stability mechanism of dense connectivity in the deep limit and establishes the first convergence guarantee—and foundational theoretical basis—for fully connected DNNs grounded in a continuous-depth limit.
📝 Abstract
In deep learning, dense layer connectivity has become a key design principle in deep neural networks (DNNs), enabling efficient information flow and strong performance across a range of applications. In this work, we model densely connected DNNs mathematically and analyze their learning problems in the deep-layer limit. For a broad applicability, we present our analysis in a framework setting of DNNs with densely connected layers and general non-local feature transformations (with local feature transformations as special cases) within layers, which is called dense non-local (DNL) framework and includes standard DenseNets and variants as special examples. In this formulation, the densely connected networks are modeled as nonlinear integral equations, in contrast to the ordinary differential equation viewpoint commonly adopted in prior works. We study the associated training problems from an optimal control perspective and prove convergence results from the network learning problem to its continuous-time counterpart. In particular, we show the convergence of optimal values and the subsequence convergence of minimizers, using a piecewise linear extension and $Γ$-convergence analysis. Our results provide a mathematical foundation for understanding densely connected DNNs and further suggest that such architectures can offer stability of training deep models.