Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Systematic scaling laws for deep reinforcement learning (DRL) remain unexplored. This work establishes the first unified DRL scaling framework across data, model, and training dimensions, introducing a co-scaling paradigm that jointly optimizes data efficiency, model expressivity, and training throughput. We integrate key techniques—including parallel environment sampling, high replay-ratio experience reuse, large-batch distributed training, Mixture-of-Experts (MoE) and ensemble architectures, and auxiliary tasks—to substantially improve scalability in large-scale DRL. We propose the first taxonomy of DRL scaling behaviors and quantitatively characterize the impact of each dimension on decision-making performance, revealing fundamental trade-offs and scaling boundaries. Our findings provide a reproducible theoretical foundation and practical engineering guidelines for scaling decision intelligence in robotics control, autonomous driving, and large language model–based agents.

Technology Category

Application Category

📝 Abstract
In recent years, the expansion of neural network models and training data has driven remarkable progress in deep learning, particularly in computer vision and natural language processing. This advancement is underpinned by the concept of Scaling Laws, which demonstrates that scaling model parameters and training data enhances learning performance. While these fields have witnessed breakthroughs, such as the development of large language models like GPT-4 and advanced vision models like Midjourney, the application of scaling laws in deep reinforcement learning (DRL) remains relatively unexplored. Despite its potential to improve performance, the integration of scaling laws into DRL for decision making has not been fully realized. This review addresses this gap by systematically analyzing scaling strategies in three dimensions: data, network, and training budget. In data scaling, we explore methods to optimize data efficiency through parallel sampling and data generation, examining the relationship between data volume and learning outcomes. For network scaling, we investigate architectural enhancements, including monolithic expansions, ensemble and MoE methods, and agent number scaling techniques, which collectively enhance model expressivity while posing unique computational challenges. Lastly, in training budget scaling, we evaluate the impact of distributed training, high replay ratios, large batch sizes, and auxiliary training on training efficiency and convergence. By synthesizing these strategies, this review not only highlights their synergistic roles in advancing DRL for decision making but also provides a roadmap for future research. We emphasize the importance of balancing scalability with computational efficiency and outline promising directions for leveraging scaling to unlock the full potential of DRL in various tasks such as robot control, autonomous driving and LLM training.
Problem

Research questions and friction points this paper is trying to address.

Exploring scaling laws in deep reinforcement learning for decision making.
Analyzing data, network, and training budget scaling strategies.
Balancing scalability and efficiency in DRL for diverse applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimize data efficiency via parallel sampling
Enhance model expressivity with architectural improvements
Boost training efficiency through distributed methods
🔎 Similar Papers
No similar papers found.
Y
Yi Ma
School of Computer and Information Technology, Shanxi University
Hongyao Tang
Hongyao Tang
Mila/UdeM
Reinforcement LearningEmbodied IntelligenceFoundation Model
C
Chenjun Xiao
School of Data Science, The Chinese University of Hong Kong (Shenzhen)
Y
Yaodong Yang
Department of Computer Science and Engineering, The Chinese University of Hong Kong
W
Wei Wei
School of Computer and Information Technology, Shanxi University
Jianye Hao
Jianye Hao
Huawei Noah's Ark Lab/Tianjin University
Multiagent SystemsEmbodied AI
Jiye Liang
Jiye Liang
Shanxi University