Signal Preserving Weight Initialization for Odd-Sigmoid Activations

๐Ÿ“… 2025-09-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Odd sigmoidal activation functions (e.g., tanh, sinh) suffer from saturation, variance collapse, and learning-rate sensitivity in deep networks. Method: We propose a signal-preserving weight initialization scheme, formally defining the odd sigmoidal function class and deriving a closed-form noise scale based on its statistical properties to stabilize activation variance across target depthsโ€”without batch normalization. Contribution/Results: Our method unifies initialization for all members of this class, relaxing the implicit monotonicity-and-boundedness assumptions underlying Xavier/He initialization. Experiments demonstrate significantly improved convergence robustness and data efficiency in deep architectures and few-shot settings, thereby expanding the design space and practical applicability of nonlinear activation functions.

Technology Category

Application Category

๐Ÿ“ Abstract
Activation functions critically influence trainability and expressivity, and recent work has therefore explored a broad range of nonlinearities. However, activations and weight initialization are interdependent: without an appropriate initialization method, nonlinearities can cause saturation, variance collapse, and increased learning rate sensitivity. We address this by defining an odd sigmoid function class and, given any activation f in this class, proposing an initialization method tailored to f. The method selects a noise scale in closed form so that forward activations remain well dispersed up to a target layer, thereby avoiding collapse to zero or saturation. Empirically, the approach trains reliably without normalization layers, exhibits strong data efficiency, and enables learning for activations under which standard initialization methods (Xavier, He, Orthogonal) often do not converge reliably.
Problem

Research questions and friction points this paper is trying to address.

Proposes weight initialization for odd sigmoid activations to prevent saturation
Avoids signal collapse and variance issues in deep neural networks
Enables reliable training without normalization layers for improved data efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines odd sigmoid activation function class
Proposes tailored weight initialization method
Ensures signal preservation without normalization layers
๐Ÿ”Ž Similar Papers
No similar papers found.