When the Left Foot Leads to the Right Path: Bridging Initial Prejudice and Trainability

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work investigates the statistical properties of deep neural network initialization, focusing on the intrinsic relationship between Initial Guess Bias (IGB) and trainability. Method: We theoretically establish that IGB is not a defect but a necessary condition for efficient training; optimal initialization inherently exhibits class preference rather than the conventional “neutrality.” We integrate IGB into the mean-field (MF) theory framework for the first time, unifying the modeling of initial bias and gradient stability, and rigorously derive their quantitative correspondence. The MF/IGB framework is further extended to multi-node nonlinear activation functions and architectures with pooling layers, yielding a novel initialization scheme that jointly ensures bias controllability and optimization stability. Results: Empirical validation on ResNet and CNN architectures demonstrates significantly accelerated convergence and improved final test accuracy.

Technology Category

Application Category

📝 Abstract

Understanding the statistical properties of deep neural networks (DNNs) at initialization is crucial for elucidating both their trainability and the intrinsic architectural biases they encode prior to data exposure. Mean-field (MF) analyses have demonstrated that the parameter distribution in randomly initialized networks dictates whether gradients vanish or explode. Concurrently, untrained DNNs were found to exhibit an initial-guessing bias (IGB), in which large regions of the input space are assigned to a single class. In this work, we derive a theoretical proof establishing the correspondence between IGB and previous MF theories, thereby connecting a network prejudice toward specific classes with the conditions for fast and accurate learning. This connection yields the counter-intuitive conclusion: the initialization that optimizes trainability is necessarily biased, rather than neutral. Furthermore, we extend the MF/IGB framework to multi-node activation functions, offering practical guidelines for designing initialization schemes that ensure stable optimization in architectures employing max- and average-pooling layers.

Problem

Research questions and friction points this paper is trying to address.

Connects initial-guessing bias to trainability in DNNs

Proves biased initialization optimizes learning efficiency

Extends theory to multi-node activation functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Connects initial-guessing bias with trainability via theory

Proves optimal initialization must be biased, not neutral

Extends framework to multi-node activation functions

🔎 Similar Papers

No similar papers found.