🤖 AI Summary
To address the poor real-time performance and high-dimensional sampling overhead of path planning in autonomous parking systems operating in complex environments, this paper proposes a reinforcement learning–driven Monte Carlo Tree Search (RL-MCTS) framework. Unlike conventional approaches, our method jointly learns a state-value function and an action policy from scratch—without relying on prior analytical heuristics or human expert demonstrations—thereby establishing a lightweight, adaptive exploration-exploitation trade-off mechanism. Through online end-to-end optimization, the framework achieves significant speedups (several-fold faster than traditional MCTS) while maintaining high path quality and safety guarantees. Extensive experiments demonstrate strong robustness and real-time decision-making capability across challenging parking scenarios, including multi-obstacle configurations, narrow slots, and dynamic disturbances. The proposed approach establishes a deployable, model-free planning paradigm for autonomous parking.
📝 Abstract
In this paper, we address a method that integrates reinforcement learning into the Monte Carlo tree search to boost online path planning under fully observable environments for automated parking tasks. Sampling-based planning methods under high-dimensional space can be computationally expensive and time-consuming. State evaluation methods are useful by leveraging the prior knowledge into the search steps, making the process faster in a real-time system. Given the fact that automated parking tasks are often executed under complex environments, a solid but lightweight heuristic guidance is challenging to compose in a traditional analytical way. To overcome this limitation, we propose a reinforcement learning pipeline with a Monte Carlo tree search under the path planning framework. By iteratively learning the value of a state and the best action among samples from its previous cycle’s outcomes, we are able to model a value estimator and a policy generator for given states. By doing that, we build up a balancing mechanism between exploration and exploitation, speeding up the path planning process while maintaining its quality without using human expert driver data.