Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis

πŸ“… 2024-10-03
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address severe activation saturation and unstable gradient propagation in deep tanh-activated neural networks, this paper proposes a robust weight initialization method based on fixed-point analysis of the scaled activation function tanh(ax). It is the first work to systematically incorporate fixed-point theory into tanh network initialization design: the tunable scaling parameter *a* dynamically mitigates saturation, ensuring stable signal propagation forward and gradient propagation backward. The method introduces two key innovationsβ€”(i) theoretical modeling grounded in the variance-preserving property of scaled tanh under layer-wise propagation, and (ii) engineering optimization via depth-adaptive selection of *a*. Extensive experiments demonstrate that, compared to Xavier (including normalized Xavier) initialization, the proposed method achieves significantly faster convergence, superior depth robustness, and higher data efficiency on both standard multi-class classification tasks and physics-informed neural networks (PINNs).

Technology Category

Application Category

πŸ“ Abstract
As a neural network's depth increases, it can improve generalization performance. However, training deep networks is challenging due to gradient and signal propagation issues. To address these challenges, extensive theoretical research and various methods have been introduced. Despite these advances, effective weight initialization methods for tanh neural networks remain insufficiently investigated. This paper presents a novel weight initialization method for neural networks with tanh activation function. Based on an analysis of the fixed points of the function $ anh(ax)$, the proposed method aims to determine values of $a$ that mitigate activation saturation. A series of experiments on various classification datasets and physics-informed neural networks demonstrates that the proposed method outperforms Xavier initialization methods~(with or without normalization) in terms of robustness across different network sizes, data efficiency, and convergence speed. Code is available at https://github.com/1HyunwooLee/Tanh-Init
Problem

Research questions and friction points this paper is trying to address.

Addresses gradient and signal propagation issues in deep tanh neural networks.
Proposes a novel weight initialization method to mitigate activation saturation.
Outperforms Xavier initialization in robustness, data efficiency, and convergence speed.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel weight initialization for tanh networks
Fixed point analysis to mitigate activation saturation
Outperforms Xavier initialization in robustness and efficiency
πŸ”Ž Similar Papers
No similar papers found.