🤖 AI Summary
To address low exploration efficiency and training instability caused by high-dimensional action spaces in autonomous driving reinforcement learning, this paper proposes a dual-mechanism framework: (1) context-aware dynamic action masking, which leverages semantic image sequences and vehicle state to filter out invalid actions in real time; and (2) state-transition-driven relative action encoding, which compresses action dimensionality while preserving action consistency. The method is integrated into a multimodal Proximal Policy Optimization (PPO) framework that fuses visual and sensor inputs, enabling structured action space modeling and real-time policy optimization. Experiments demonstrate significant improvements in training stability and convergence speed. Specifically, the approach achieves a 12.7% increase in control accuracy across diverse complex driving scenarios and exhibits superior generalization performance compared to both fixed-dimension reduction and full-action-space baselines.
📝 Abstract
Reinforcement Learning (RL) offers a promising framework for autonomous driving by enabling agents to learn control policies through interaction with environments. However, large and high-dimensional action spaces often used to support fine-grained control can impede training efficiency and increase exploration costs. In this study, we introduce and evaluate two novel structured action space modification strategies for RL in autonomous driving: dynamic masking and relative action space reduction. These approaches are systematically compared against fixed reduction schemes and full action space baselines to assess their impact on policy learning and performance. Our framework leverages a multimodal Proximal Policy Optimization agent that processes both semantic image sequences and scalar vehicle states. The proposed dynamic and relative strategies incorporate real-time action masking based on context and state transitions, preserving action consistency while eliminating invalid or suboptimal choices. Through comprehensive experiments across diverse driving routes, we show that action space reduction significantly improves training stability and policy performance. The dynamic and relative schemes, in particular, achieve a favorable balance between learning speed, control precision, and generalization. These findings highlight the importance of context-aware action space design for scalable and reliable RL in autonomous driving tasks.