Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

πŸ“… 2025-05-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address unstable temporal-difference (TD) optimization in multi-task reinforcement learning (MT-RL) caused by sparse rewards and gradient interference across tasks, this paper proposes an online MT-RL framework based on a classification-style value network. The method employs a high-capacity neural network that conditions the value function on learnable task embeddings and replaces the conventional TD error with a cross-entropy lossβ€”thereby fundamentally mitigating inter-task interference. This is the first work to enable robust online training of large-scale classification-based value functions in MT-RL. Evaluated on seven benchmarks comprising over 280 tasks, the approach achieves state-of-the-art performance while preserving single-task optimality, strong multi-task generalization, and sample-efficient transfer to unseen tasks.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in language modeling and vision stem from training large models on diverse, multi-task data. This paradigm has had limited impact in value-based reinforcement learning (RL), where improvements are often driven by small models trained in a single-task context. This is because in multi-task RL sparse rewards and gradient conflicts make optimization of temporal difference brittle. Practical workflows for generalist policies therefore avoid online training, instead cloning expert trajectories or distilling collections of single-task policies into one agent. In this work, we show that the use of high-capacity value models trained via cross-entropy and conditioned on learnable task embeddings addresses the problem of task interference in online RL, allowing for robust and scalable multi-task training. We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL. We find that, despite its simplicity, the proposed approach leads to state-of-the-art single and multi-task performance, as well as sample-efficient transfer to new tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses task interference in online reinforcement learning
Enables robust multi-task training with high-capacity value models
Improves performance and transfer efficiency across diverse tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-capacity value models for multi-task RL
Cross-entropy training with learnable task embeddings
Robust scalable online multi-task training
πŸ”Ž Similar Papers
No similar papers found.