Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This paper addresses the degradation of model generalization and robustness caused by excessively low entropy in output distributions. We propose Entropy-Regularized Activation (ERA), a novel paradigm that introduces a differentiable, parameterized output activation function to explicitly enforce a lower bound on the sampling entropy of predicted distributions. By elevating the activation function to an entropy control module, ERA achieves consistent performance gains across diverse domains with minimal overhead (<7% computational cost). Specifically, it improves Qwen2.5-Math-7B’s score on AIME 2025 by 37.4%, surpasses baselines such as SAC by over 30% in policy performance on HumanoidBench, and boosts ResNet-50’s top-1 accuracy on ImageNet by 0.69%. Crucially, ERA requires no architectural modifications or changes to training objectives, offering a task-agnostic, low-overhead, and highly generalizable entropy-aware optimization framework for continuous control, large language model inference, and image classification.

Technology Category

Application Category

📝 Abstract

We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.

Problem

Research questions and friction points this paper is trying to address.

Proposes entropy-regularized activation to boost model performance

Enhances LLMs, continuous control, and image classification tasks

Achieves significant gains with minimal computational overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

ERA uses entropy-regularizing activation for model outputs

It boosts performance across LLMs, control, and image tasks

Method achieves gains with under 7% computational overhead

🔎 Similar Papers

No similar papers found.