🤖 AI Summary
To address target bias, poor policy flexibility, and slow convergence in autonomous driving reinforcement learning—caused by monolithic action representations and scalarized single rewards—this paper proposes a hybrid parameterized action space coupled with a multi-objective decoupled critic architecture. Our approach jointly enables abstract decision-making and fine-grained control to ensure execution-layer compatibility across multiple objectives, while uncertainty-driven exploration accelerates optimization-layer convergence. Innovatively integrating parameterized action representations, multi-head critic networks, and Pareto-aware reward aggregation, the method is trained jointly on HighD real-world trajectory data and high-fidelity simulation. Compared to baseline methods, it achieves a 42% improvement in training efficiency and a 31% increase in Pareto-front coverage across driving objectives—significantly enhancing driving efficiency, action consistency, and safety.
📝 Abstract
Reinforcement Learning (RL) has shown excellent performance in solving decision-making and control problems of autonomous driving, which is increasingly applied in diverse driving scenarios. However, driving is a multi-attribute problem, leading to challenges in achieving multi-objective compatibility for current RL methods, especially in both policy execution and policy iteration. On the one hand, the common action space structure with single action type limits driving flexibility or results in large behavior fluctuations during policy execution. On the other hand, the multi-attribute weighted single reward function result in the agent's disproportionate attention to certain objectives during policy iterations. To this end, we propose a Multi-objective Ensemble-Critic reinforcement learning method with Hybrid Parametrized Action for multi-objective compatible autonomous driving. Specifically, a parameterized action space is constructed to generate hybrid driving actions, combining both abstract guidance and concrete control commands. A multi-objective critics architecture is constructed considering multiple attribute rewards, to ensure simultaneously focusing on different driving objectives. Additionally, uncertainty-based exploration strategy is introduced to help the agent faster approach viable driving policy. The experimental results in both the simulated traffic environment and the HighD dataset demonstrate that our method can achieve multi-objective compatible autonomous driving in terms of driving efficiency, action consistency, and safety. It enhances the general performance of the driving while significantly increasing training efficiency.