Adaptive Parametric Activation

📅 2024-07-11

🏛️ European Conference on Computer Vision

📈 Citations: 7

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Activation functions exhibit poor generalization across balanced and imbalanced classification tasks; notably, Sigmoid suffers from class bias and degraded performance in long-tailed settings. To address this, we propose Distribution-Aware Parametric Activation (APA), the first activation framework grounded in statistical analysis revealing the intrinsic coupling between activation behavior and data distribution. APA introduces a learnable, task-adaptive unified activation formula, compatible with CNNs, Transformers, multimodal models, and large language models. We further design a data-distribution alignment optimization strategy enabling cross-layer (intermediate and attention layers), cross-task, and cross-architecture transfer. Evaluated on five long-tailed benchmarks—including ImageNet-LT—APA consistently surpasses state-of-the-art methods. Moreover, it delivers consistent performance gains across diverse downstream tasks: object detection, vision-language instruction following, image generation, and text prediction.

Technology Category

Application Category

📝 Abstract

The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical analysis in the classification and intermediate layers of both balanced and imbalanced networks and we empirically show that aligning the activation function with the data distribution, enhances the performance in both balanced and imbalanced tasks. To this end, we propose the Adaptive Parametric Activation (APA) function, a novel and versatile activation function that unifies most common activation functions under a single formula. APA can be applied in both intermediate layers and attention layers, significantly outperforming the state-of-the-art on several imbalanced benchmarks such as ImageNet-LT, iNaturalist2018, Places-LT, CIFAR100-LT and LVIS. Also, we extend APA to a plethora of other tasks such as classification, detection, visual instruction following tasks, image generation and next-text-token prediction benchmarks. APA increases the performance in multiple benchmarks across various model architectures. The code is available at https://github.com/kostas1515/AGLU.

Problem

Research questions and friction points this paper is trying to address.

Unifying diverse activation functions through a single adaptive parametric formula

Addressing activation function bias in imbalanced classification tasks

Enhancing performance across multiple domains including vision and language tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Parametric Activation unifies common activation functions

APA aligns activation function with data distribution

APA applies to intermediate and attention layers

🔎 Similar Papers

APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks