λ-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of reconciling the training benefits of smooth activation functions with the deployment compatibility of ReLU. The authors propose λ-GELU, a GELU variant featuring a learnable sharpness parameter λ, which enables a controlled transition from smooth training to ReLU-equivalent inference. Through constrained reparameterization and an optimizer-aware update mechanism for λ, the method yields structured inter-layer sharpness distributions across diverse architectures—including MLPs, CNNs, and Transformers—and permits lossless post-training replacement of λ-GELU with ReLU. This substitution incurs minimal conversion interference, thereby achieving both high training efficiency and deployment friendliness.

Technology Category

Application Category

📝 Abstract
Gaussian Error Linear Unit (GELU) is a widely used smooth alternative to Rectifier Linear Unit (ReLU), yet many deployment, compression, and analysis toolchains are most naturally expressed for piecewise-linear (ReLU-type) networks. We study a hardness-parameterized formulation of GELU, f(x;λ)=xΦ(λ x), where Φ is the Gaussian CDF and λ \in [1, infty) controls gate sharpness, with the goal of turning smooth gated training into a controlled path toward ReLU-compatible models. Learning λ is non-trivial: naive updates yield unstable dynamics and effective gradient attenuation, so we introduce a constrained reparameterization and an optimizer-aware update scheme. Empirically, across a diverse set of model--dataset pairs spanning MLPs, CNNs, and Transformers, we observe structured layerwise hardness profiles and assess their robustness under different initializations. We further study a deterministic ReLU-ization strategy in which the learned gates are progressively hardened toward a principled target, enabling a post-training substitution of λ-GELU by ReLU with reduced disruption. Overall, λ-GELU provides a minimal and interpretable knob to profile and control gating hardness, bridging smooth training with ReLU-centric downstream pipelines.
Problem

Research questions and friction points this paper is trying to address.

GELU
ReLU
activation function
model conversion
gating hardness
Innovation

Methods, ideas, or system contributions that make the work stand out.

λ-GELU
gating hardness
ReLU-ization
constrained reparameterization
optimizer-aware update
🔎 Similar Papers
No similar papers found.
C
Cristian Pérez-Corral
Universitat Politècnica de València, Valencia, Spain
A
Alberto Fernández-Hernández
Universitat Politècnica de València, Valencia, Spain
J
Jose I. Mestre
Universitat Jaume I, Castelló de la Plana, Spain
Manuel F. Dolz
Manuel F. Dolz
Universitat Jaume I
High Performance ComputingEnergy EfficiencyParallel Programming ModelsPerformance AnalysisDeep Learning
Enrique S. Quintana-Ortí
Enrique S. Quintana-Ortí
Universitat Politècnica de València, Spain