LION-DG: Layer-Informed Initialization with Deep Gradient Protocols for Accelerated Neural Network Training

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the instability in deep supervision architectures caused by conventional neural network initialization methods, which fail to account for structural differences between the backbone and auxiliary classification heads. This mismatch leads to harmful gradients from auxiliary heads during early training, degrading optimization stability. To resolve this, the authors propose LION-DG, a layer-aware initialization strategy that applies He initialization to the backbone while zero-initializing the weights of auxiliary heads. This design enables an implicit “gradient warm-up” mechanism—termed gradient awakening—without introducing extra hyperparameters or computational overhead. LION-DG is compatible with existing schemes such as LSUV and significantly accelerates convergence on CIFAR-10/100, achieving speedups of 8.3% for DenseNet-DS and 11.3% for ResNet-DS. When combined with LSUV, it attains a state-of-the-art accuracy of 81.92% on CIFAR-10.

Technology Category

Application Category

📝 Abstract

Weight initialization remains decisive for neural network optimization, yet existing methods are largely layer-agnostic. We study initialization for deeply-supervised architectures with auxiliary classifiers, where untrained auxiliary heads can destabilize early training through gradient interference. We propose LION-DG, a layer-informed initialization that zero-initializes auxiliary classifier heads while applying standard He-initialization to the backbone. We prove that this implements Gradient Awakening: auxiliary gradients are exactly zero at initialization, then phase in naturally as weights grow -- providing an implicit warmup without hyperparameters. Experiments on CIFAR-10 and CIFAR-100 with DenseNet-DS and ResNet-DS architectures demonstrate: (1) DenseNet-DS: +8.3% faster convergence on CIFAR-10 with comparable accuracy, (2) Hybrid approach: Combining LSUV with LION-DG achieves best accuracy (81.92% on CIFAR-10), (3) ResNet-DS: Positive speedup on CIFAR-100 (+11.3%) with side-tap auxiliary design. We identify architecture-specific trade-offs and provide clear guidelines for practitioners. LION-DG is simple, requires zero hyperparameters, and adds no computational overhead.

Problem

Research questions and friction points this paper is trying to address.

weight initialization

auxiliary classifiers

gradient interference

deeply-supervised architectures

neural network training

Innovation

Methods, ideas, or system contributions that make the work stand out.

layer-informed initialization

gradient awakening

deeply-supervised networks