Neural Networks with Orthogonal Jacobian

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address degraded trainability in deep neural networks caused by vanishing/exploding gradients, this paper proposes a unified mathematical framework grounded in Jacobian orthogonality, ensuring input-output dynamic isometry almost everywhere for both feedforward and residual networks—without requiring explicit skip connections to achieve residual-like training stability. Methodologically, the approach enforces approximate Jacobian orthogonality via orthogonal initialization, Jacobian regularization, and partially isometric architectural design, applicable to both nonlinear feedforward and residual networks. Experiments demonstrate that orthogonal initialization alone substantially stabilizes training of extremely deep networks, matching or exceeding the performance of leading regularization techniques and yielding competitive results across diverse deep architectures. The core contribution is the first identification of *global Jacobian orthogonality* as the fundamental geometric condition governing trainability in deep networks.

Technology Category

Application Category

📝 Abstract

Very deep neural networks achieve state-of-the-art performance by extracting rich, hierarchical features. Yet, training them via backpropagation is often hindered by vanishing or exploding gradients. Existing remedies, such as orthogonal or variance-preserving initialisation and residual architectures, allow for a more stable gradient propagation and the training of deeper models. In this work, we introduce a unified mathematical framework that describes a broad class of nonlinear feedforward and residual networks, whose input-to-output Jacobian matrices are exactly orthogonal almost everywhere. Such a constraint forces the resulting networks to achieve perfect dynamical isometry and train efficiently despite being very deep. Our formulation not only recovers standard architectures as particular cases but also yields new designs that match the trainability of residual networks without relying on conventional skip connections. We provide experimental evidence that perfect Jacobian orthogonality at initialisation is sufficient to stabilise training and achieve competitive performance. We compare this strategy to networks regularised to maintain the Jacobian orthogonality and obtain comparable results. We further extend our analysis to a class of networks well-approximated by those with orthogonal Jacobians and introduce networks with Jacobians representing partial isometries. These generalized models are then showed to maintain the favourable trainability properties.

Problem

Research questions and friction points this paper is trying to address.

Stabilizing gradient flow in deep neural networks

Enforcing exact orthogonal Jacobian matrices in networks

Improving trainability without traditional skip connections

Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Jacobian matrices ensure perfect dynamical isometry

Unified framework for stable deep neural network training

Generalized models maintain trainability without skip connections

🔎 Similar Papers

Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives