Sinusoidal Initialization, Time for a New Start

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Random weight initialization schemes (e.g., Glorot, He) for deep neural networks often yield imbalanced weight and activation distributions across layers, impairing convergence speed, training stability, and generalization. To address this, we propose SineInit—a novel deterministic initialization method that constructs structured weight matrices using sinusoidal functions to enforce intra-layer activation distribution equalization and inter-layer weight balance. SineInit is the first initialization approach to incorporate periodic deterministic structure, requiring no hyperparameter tuning and demonstrating broad compatibility with CNNs, Vision Transformers (ViTs), and large language models (LLMs). Extensive experiments across diverse architectures and tasks show that SineInit consistently improves validation accuracy by an average of 4.8%, accelerates convergence by 20.9%, and significantly enhances both training stability and final generalization performance.

Technology Category

Application Category

📝 Abstract
Initialization plays a critical role in Deep Neural Network training, directly influencing convergence, stability, and generalization. Common approaches such as Glorot and He initializations rely on randomness, which can produce uneven weight distributions across layer connections. In this paper, we introduce the Sinusoidal initialization, a novel deterministic method that employs sinusoidal functions to construct structured weight matrices expressly to improve the spread and balance of weights throughout the network while simultaneously fostering a more uniform, well-conditioned distribution of neuron activation states from the very first forward pass. Because Sinusoidal initialization begins with weights and activations that are already evenly and efficiently utilized, it delivers consistently faster convergence, greater training stability, and higher final accuracy across a wide range of models, including convolutional neural networks, vision transformers, and large language models. On average, our experiments show an increase of 4.8 % in final validation accuracy and 20.9 % in convergence speed. By replacing randomness with structure, this initialization provides a stronger and more reliable foundation for Deep Learning systems.
Problem

Research questions and friction points this paper is trying to address.

Improves weight distribution in neural networks
Enhances convergence speed and training stability
Increases final model accuracy across architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sinusoidal functions for structured weight matrices
Deterministic method replacing random initialization
Improves weight spread and activation balance
🔎 Similar Papers
No similar papers found.