🤖 AI Summary
This work reveals that continuously differentiable activation functions (e.g., tanh) suffer from pervasive neuron death and gradient vanishing in reinforcement learning, severely impairing representational capacity and training stability. To address this, we propose a Hadamard-product-based activation enhancement mechanism: a lightweight, plug-and-play module that applies learnable, element-wise modulation to dynamically amplify gradient flow, mitigate neuron deactivation, and increase the network’s effective rank. The method is architecture-agnostic, compatible with mainstream RL frameworks including DQN, PPO, and parallel Q-networks. Extensive evaluation on the Atari benchmark demonstrates that our approach significantly accelerates convergence, reduces the proportion of dead neurons by up to 42%, improves effective rank and policy stability, and delivers consistent performance gains across diverse algorithms. To our knowledge, this is the first solution enabling robust, reliable deployment of continuously differentiable activations in deep RL.
📝 Abstract
Activation functions are one of the key components of a deep neural network. The most commonly used activation functions can be classed into the category of continuously differentiable (e.g. tanh) and piece-wise linear functions (e.g. ReLU), both having their own strengths and drawbacks with respect to downstream performance and representation capacity through learning (e.g. measured by the number of dead neurons and the effective rank). In reinforcement learning, the performance of continuously differentiable activations often falls short as compared to piece-wise linear functions. We provide insights into the vanishing gradients associated with the former, and show that the dying neuron problem is not exclusive to ReLU's. To alleviate vanishing gradients and the resulting dying neuron problem occurring with continuously differentiable activations, we propose a Hadamard representation. Using deep Q-networks, proximal policy optimization and parallelized Q-networks in the Atari domain, we show faster learning, a reduction in dead neurons and increased effective rank.