🤖 AI Summary
To address the trade-off between training efficiency, expressive capacity, and optimization stability in deep neural network activation functions, this paper proposes MoLU—a novel activation function constructed from elementary exponential and logarithmic functions. MoLU is strictly analytical, everywhere differentiable, and a diffeomorphism, ensuring stable gradient flow without requiring normalization or gating mechanisms. Its adaptive piecewise-linear structure achieves both computational efficiency and enhanced representational power, overcoming the classical expressivity–optimization-efficiency bottleneck. Extensive experiments across multiple benchmark datasets demonstrate that MoLU consistently outperforms mainstream activations—including ReLU, Swish, and GELU—yielding 12%–18% faster training convergence, improved optimization stability, and higher generalization accuracy.
📝 Abstract
We propose a new high-performance activation function, Moderate Adaptive Linear Units (MoLU), for the deep neural network. The MoLU is a simple, beautiful and powerful activation function that can be a good main activation function among hundreds of activation functions. Because the MoLU is made up of the elementary functions, not only it is a diffeomorphism (i.e. analytic over whole domains), but also it reduces the training time.