🤖 AI Summary
To address the safety-efficiency trade-off in mixed-autonomy traffic at roundabouts—where autonomous vehicles (AVs) coexist with human-driven vehicles—this paper proposes a conflict-free, zero-speed-loss decision-making framework. Methodologically, we introduce a novel hierarchical reinforcement learning architecture that integrates Kolmogorov–Arnold Networks (KANs) with an interpretable action review mechanism, synergistically combined with model predictive control (MPC) and dynamic path planning to jointly optimize collision avoidance and throughput. Unlike conventional RL approaches, our framework overcomes the longstanding challenge of balancing safety guarantees with real-time computational feasibility. Extensive simulations across diverse traffic volumes demonstrate a 42% reduction in collision rate, a 19% decrease in average passage time, and stable reward convergence (variance < 0.03), significantly outperforming state-of-the-art methods.
📝 Abstract
Safety and efficiency are crucial for autonomous driving in roundabouts, especially in the context of mixed traffic where autonomous vehicles (AVs) and human-driven vehicles coexist. This paper introduces a learning-based algorithm tailored to foster safe and efficient driving behaviors across varying levels of traffic flows in roundabouts. The proposed algorithm employs a deep Q-learning network to effectively learn safe and efficient driving strategies in complex multi-vehicle roundabouts. Additionally, a KAN (Kolmogorov-Arnold network) enhances the AVs' ability to learn their surroundings robustly and precisely. An action inspector is integrated to replace dangerous actions to avoid collisions when the AV interacts with the environment, and a route planner is proposed to enhance the driving efficiency and safety of the AVs. Moreover, a model predictive control is adopted to ensure stability and precision of the driving actions. The results show that our proposed system consistently achieves safe and efficient driving whilst maintaining a stable training process, as evidenced by the smooth convergence of the reward function and the low variance in the training curves across various traffic flows. Compared to state-of-the-art benchmarks, the proposed algorithm achieves a lower number of collisions and reduced travel time to destination.