🤖 AI Summary
To address neuron death in ReLU and its variants, as well as the suboptimal hidden-space variance suppression and gradient instability of self-gated activations like GELU and Swish, this paper proposes the Gompertz Linear Unit (GoLU)—the first self-gated activation integrating the asymmetric sigmoidal Gompertz function (e^{-e^{-x}}) into activation design, formulated as (x cdot ext{Gompertz}(x)). Leveraging its intrinsic asymmetry, GoLU preserves linear positive-domain response while substantially enhancing negative-domain responsiveness, thereby effectively reducing hidden-layer variance and stabilizing gradient flow. Theoretical analysis and extensive empirical evaluation across six diverse tasks—including image classification, language modeling, semantic/instance segmentation, and diffusion modeling—demonstrate that GoLU consistently outperforms ReLU, Swish, and GELU in accuracy, convergence speed, and generalization capability.
📝 Abstract
Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, while widely used, is prone to the dying neuron problem, which has been mitigated by variants such as LeakyReLU, PReLU, and ELU that better handle negative neuron outputs. Recently, self-gated activations like GELU and Swish have emerged as state-of-the-art alternatives, leveraging their smoothness to ensure stable gradient flow and prevent neuron inactivity. In this work, we introduce the Gompertz Linear Unit (GoLU), a novel self-gated activation function defined as $mathrm{GoLU}(x) = x , mathrm{Gompertz}(x)$, where $mathrm{Gompertz}(x) = e^{-e^{-x}}$. The GoLU activation leverages the asymmetry in the Gompertz function to reduce variance in the latent space more effectively compared to GELU and Swish, while preserving robust gradient flow. Extensive experiments across diverse tasks, including Image Classification, Language Modeling, Semantic Segmentation, Object Detection, Instance Segmentation, and Diffusion, highlight GoLU's superior performance relative to state-of-the-art activation functions, establishing GoLU as a robust alternative to existing activation functions.