Action Space Reduction Strategies for Reinforcement Learning in Autonomous Driving

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low exploration efficiency and training instability caused by high-dimensional action spaces in autonomous driving reinforcement learning, this paper proposes a dual-mechanism framework: (1) context-aware dynamic action masking, which leverages semantic image sequences and vehicle state to filter out invalid actions in real time; and (2) state-transition-driven relative action encoding, which compresses action dimensionality while preserving action consistency. The method is integrated into a multimodal Proximal Policy Optimization (PPO) framework that fuses visual and sensor inputs, enabling structured action space modeling and real-time policy optimization. Experiments demonstrate significant improvements in training stability and convergence speed. Specifically, the approach achieves a 12.7% increase in control accuracy across diverse complex driving scenarios and exhibits superior generalization performance compared to both fixed-dimension reduction and full-action-space baselines.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning (RL) offers a promising framework for autonomous driving by enabling agents to learn control policies through interaction with environments. However, large and high-dimensional action spaces often used to support fine-grained control can impede training efficiency and increase exploration costs. In this study, we introduce and evaluate two novel structured action space modification strategies for RL in autonomous driving: dynamic masking and relative action space reduction. These approaches are systematically compared against fixed reduction schemes and full action space baselines to assess their impact on policy learning and performance. Our framework leverages a multimodal Proximal Policy Optimization agent that processes both semantic image sequences and scalar vehicle states. The proposed dynamic and relative strategies incorporate real-time action masking based on context and state transitions, preserving action consistency while eliminating invalid or suboptimal choices. Through comprehensive experiments across diverse driving routes, we show that action space reduction significantly improves training stability and policy performance. The dynamic and relative schemes, in particular, achieve a favorable balance between learning speed, control precision, and generalization. These findings highlight the importance of context-aware action space design for scalable and reliable RL in autonomous driving tasks.
Problem

Research questions and friction points this paper is trying to address.

Reducing large action spaces in RL for autonomous driving
Improving training efficiency and policy performance
Balancing learning speed and control precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic masking for real-time action filtering
Relative action space reduction for efficiency
Multimodal PPO agent processing image and scalar inputs
🔎 Similar Papers
No similar papers found.