🤖 AI Summary
This work investigates how the approximation capacity of deep residual networks varies with depth by recasting the function approximation problem as a geometric control problem involving diffeomorphic flows driven by families of vector fields in continuous dynamical systems. By constructing an induced sub-Finsler manifold, the authors employ variational principles to characterize the local geometric structure and quantify approximation efficiency via geodesic distance. They establish, for the first time, a quantitative link between the approximation rates of deep networks and the sub-Finsler geometry on the group of diffeomorphisms. This connection reveals a fundamental distinction between the compositional/dynamical approximation mechanism inherent in deep learning and classical linear approximation theory, offering a novel geometric perspective for understanding the expressive efficiency of deep architectures.
📝 Abstract
We investigate the dependence of the approximation capacity of deep residual networks on its depth in a continuous dynamical systems setting. This can be formulated as the general problem of quantifying the minimal time-horizon required to approximate a diffeomorphism by flows driven by a given family $\mathcal F$ of vector fields. We show that this minimal time can be identified as a geodesic distance on a sub-Finsler manifold of diffeomorphisms, where the local geometry is characterised by a variational principle involving $\mathcal F$. This connects the learning efficiency of target relationships to their compatibility with the learning architectural choice. Further, the results suggest that the key approximation mechanism in deep learning, namely the approximation of functions by composition or dynamics, differs in a fundamental way from linear approximation theory, where linear spaces and norm-based rate estimates are replaced by manifolds and geodesic distances.