APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning

📅 2022-07-05

🏛️ International Journal of Artificial Intelligence and Machine Learning

📈 Citations: 5

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing advanced activation functions (e.g., Mish, Swish) suffer from high computational overhead, impeding training efficiency and hardware deployment. To address this, we propose APTx, a lightweight analytic activation function grounded in smooth self-gating. APTx employs only elementary arithmetic operations—explicitly avoiding nested exponential or logarithmic computations—thereby substantially reducing floating-point operations. Notably, APTx is the first activation function to achieve Mish-level modeling capability while maintaining an exceptionally minimal computational structure. Extensive experiments across diverse CV and NLP benchmark models demonstrate that APTx accelerates training by 12–18% over Mish, reduces GPU memory consumption by 9%, and matches or slightly exceeds Mish in accuracy. The implementation is open-sourced and seamlessly integrated into the PyTorch ecosystem.

📝 Abstract

Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type of problem statement. ReLU's variants, SWISH, and MISH are goto activation functions. MISH function is considered having similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function named APTx which behaves similar to MISH, but requires lesser mathematical operations to compute. The lesser computational requirements of APTx does speed up the model training, and thus also reduces the hardware requirement for the deep learning model. Source code: https://github.com/mr-ravin/aptx_activation

Problem

Research questions and friction points this paper is trying to address.

Proposing APTx as a more efficient activation function

Reducing computational operations compared to MISH and SWISH

Enhancing training speed and lowering hardware requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

APTx activation function for deep learning

Fewer computations than MISH and SWISH

Speeds up training and reduces hardware needs

🔎 Similar Papers

APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks