TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that Transformer-based Q-functions in reinforcement learning fail to scale effectively with model size, often suffering from training instability and performance degradation due to attention score collapse. The study identifies— for the first time—that this issue stems from a sharp decline in attention entropy as model capacity increases. To mitigate this, the authors propose an attention entropy regularization mechanism that explicitly controls the attention distribution, thereby preventing collapse. This approach substantially enhances both the training stability and performance of large-scale Transformer Q-functions, achieving up to a 43% performance gain when scaling from the smallest to the largest model configuration, whereas existing methods exhibit noticeable degradation under the same conditions.

Technology Category

Application Category

📝 Abstract
Despite scale driving substantial recent advancements in machine learning, reinforcement learning (RL) methods still primarily use small value functions. Naively scaling value functions -- including with a transformer architecture, which is known to be highly scalable -- often results in learning instability and worse performance. In this work, we ask what prevents transformers from scaling effectively for value functions? Through empirical analysis, we identify the critical failure mode in this scaling: attention scores collapse as capacity increases. Our key insight is that we can effectively prevent this collapse and stabilize training by controlling the entropy of the attention scores, thereby enabling the use of larger models. To this end, we propose Transformer Q-Learning (TQL), a method that unlocks the scaling potential of transformers in learning value functions in RL. Our approach yields up to a 43% improvement in performance when scaling from the smallest to the largest network sizes, while prior methods suffer from performance degradation.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
value function scaling
attention collapse
transformer
Q-function
Innovation

Methods, ideas, or system contributions that make the work stand out.

attention collapse
entropy regularization
Transformer Q-Learning
value function scaling
reinforcement learning
🔎 Similar Papers
2024-08-10AAAI Conference on Artificial IntelligenceCitations: 30