The Resurrection of the ReLU

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ReLU suffers from the “dying neuron” problem, leading to vanishing gradients and degraded performance. To address this, we propose SUGAR: a backward-compatible activation method that retains standard ReLU in the forward pass—preserving sparsity and computational efficiency—while introducing a learnable, smooth surrogate gradient in the backward pass to dynamically revive dead neurons, replacing the conventional zero gradient. Crucially, SUGAR is the first to formalize the surrogate gradient as a ReLU-specific regularizer, systematically mitigating neuron death without altering forward propagation. The method is plug-and-play and compatible with mainstream architectures including VGG-16, ResNet-18, ConvNeXt, and Swin Transformer. Experiments across multiple vision tasks demonstrate that SUGAR consistently outperforms advanced activations such as GELU and SELU, yielding improved generalization, enhanced activation sparsity, and effective reactivation of previously dead ReLU units.

Technology Category

Application Category

📝 Abstract
Modeling sophisticated activation functions within deep learning architectures has evolved into a distinct research direction. Functions such as GELU, SELU, and SiLU offer smooth gradients and improved convergence properties, making them popular choices in state-of-the-art models. Despite this trend, the classical ReLU remains appealing due to its simplicity, inherent sparsity, and other advantageous topological characteristics. However, ReLU units are prone to becoming irreversibly inactive - a phenomenon known as the dying ReLU problem - which limits their overall effectiveness. In this work, we introduce surrogate gradient learning for ReLU (SUGAR) as a novel, plug-and-play regularizer for deep architectures. SUGAR preserves the standard ReLU function during the forward pass but replaces its derivative in the backward pass with a smooth surrogate that avoids zeroing out gradients. We demonstrate that SUGAR, when paired with a well-chosen surrogate function, substantially enhances generalization performance over convolutional network architectures such as VGG-16 and ResNet-18, providing sparser activations while effectively resurrecting dead ReLUs. Moreover, we show that even in modern architectures like Conv2NeXt and Swin Transformer - which typically employ GELU - substituting these with SUGAR yields competitive and even slightly superior performance. These findings challenge the prevailing notion that advanced activation functions are necessary for optimal performance. Instead, they suggest that the conventional ReLU, particularly with appropriate gradient handling, can serve as a strong, versatile revived classic across a broad range of deep learning vision models.
Problem

Research questions and friction points this paper is trying to address.

Addressing the dying ReLU problem in deep learning
Enhancing ReLU performance with surrogate gradient learning
Reviving ReLU for modern architectures like transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces SUGAR for ReLU gradient handling
Uses smooth surrogate to prevent dead ReLUs
Enhances performance in various network architectures
🔎 Similar Papers
No similar papers found.
C
Cocsku Can Horuz
Institute of Robotics and Cognitive Systems, University of Lübeck
G
Geoffrey Kasenbacher
Institute of Robotics and Cognitive Systems, University of Lübeck; Mercedes-Benz AG
S
Saya Higuchi
Institute of Robotics and Cognitive Systems, University of Lübeck
S
Sebastian Kairat
Institute of Robotics and Cognitive Systems, University of Lübeck
J
Jendrik Stoltz
Institute of Robotics and Cognitive Systems, University of Lübeck
M
Moritz Pesl
Institute of Robotics and Cognitive Systems, University of Lübeck
Bernhard A. Moser
Bernhard A. Moser
SCCH and Institute of Signal Processing, JKU, Austria
Applied MathematicsMachine LearningSpike-based Signal Processing and Learning
C
C. Linse
Institute of Neuro- and Bioinformatics, University of Lübeck
Thomas Martinetz
Thomas Martinetz
Professor of Computer Science, University of Lübeck
neural networksmachine learningartificial intelligencecomputational neurosciencekünstliche
Sebastian Otte
Sebastian Otte
Institute for Robotics and Cognitive Systems
Artificial IntelligenceMachine LearningNeural Networks