🤖 AI Summary
This work addresses the lack of quantitative characterization of the geometric and spectral structure of intermediate representations in deep residual networks, which hinders understanding of the interplay among depth, spectral decay, and stability. The authors model the residual architecture as a product of near-identity Jacobian matrices and map weights onto the cone of positive-definite matrices via Frobenius normalization. They introduce normalized top-radial Cartan coordinates and employ power-law fitting to characterize the eigenvalue spectrum. Building on Cartan orbits, they develop a geometric framework for power-law spectra and establish rigidity inequalities involving slackness, non-backtracking relaxation, and residual variation, revealing intrinsic connections to Fisher information and the Bures–Wasserstein metric. The analysis shows that, under a fixed depth budget, the power-law exponent drifts at order (log M)/L, yielding spectral tail bounds and estimates for the robust rank window transition at finite width, with measurable diagnostics enabled by static weight exponent profiles.
📝 Abstract
Deep residual architectures are modeled as products of near-identity Jacobians. This paper proves deterministic quotient-geometric estimates for singular spectra of Frobenius-normalized layer factors, emphasizing a normalized top-radial Cartan coordinate and fitted power-law chart.
Full-rank factors are mapped from $\mathrm{GL}(d)$ to the positive cone by $A\mapsto A^\top A$, then to ordered eigenvalue data. Under Frobenius normalization, exact power-law spectra form a trace-normalized Cartan orbit. This orbit is a Gibbs family on ranks, a Fisher information line, and a Bures--Wasserstein curve with line element $d/4$ times Fisher information.
The main rigidity theorem is a slack-aware margin inequality: interface radial amplitude, non-backtracking slack, and signed residual variation control displacement of the fitted Cartan coordinate. In the exact-chart zero-slack case, a depth-$L$ budget gives exponent drift of order $(\log M)/L$; generally, slack and residual increments augment the bound.
We separate scalar top-radial from full-Cartan spectral control, which also needs Bures/Hellinger residual variation. We prove approximate-power-law and metric-chart versions, converse lower bounds, Fisher--KL/Bures action estimates, and near-identity expansions for normalized residual chains.
Near-identity results verify transport budgets; chart quality remains measurable. Effective rank is a spectral-energy quantile, giving finite-width power-law tail bounds and robust rank-window transition estimates. Empirical static-weight exponent profiles serve as diagnostics; full verification also requires interface budgets, slacks, and residuals for the same operator chain.