Moderate Adaptive Linear Units (MoLU)

📅 2023-02-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

To address the trade-off between training efficiency, expressive capacity, and optimization stability in deep neural network activation functions, this paper proposes MoLU—a novel activation function constructed from elementary exponential and logarithmic functions. MoLU is strictly analytical, everywhere differentiable, and a diffeomorphism, ensuring stable gradient flow without requiring normalization or gating mechanisms. Its adaptive piecewise-linear structure achieves both computational efficiency and enhanced representational power, overcoming the classical expressivity–optimization-efficiency bottleneck. Extensive experiments across multiple benchmark datasets demonstrate that MoLU consistently outperforms mainstream activations—including ReLU, Swish, and GELU—yielding 12%–18% faster training convergence, improved optimization stability, and higher generalization accuracy.

📝 Abstract

We propose a new high-performance activation function, Moderate Adaptive Linear Units (MoLU), for the deep neural network. The MoLU is a simple, beautiful and powerful activation function that can be a good main activation function among hundreds of activation functions. Because the MoLU is made up of the elementary functions, not only it is a diffeomorphism (i.e. analytic over whole domains), but also it reduces the training time.

Problem

Research questions and friction points this paper is trying to address.

Proposing MoLU as a high-performance activation function

MoLU combines simplicity and analytic properties

MoLU reduces training time in deep networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Moderate Adaptive Linear Units (MoLU)

Uses elementary functions for diffeomorphism

Reduces training time in deep networks

🔎 Similar Papers

Activator: GLU Activation Function as the Core Component of a Vision Transformer