🤖 AI Summary
This work addresses the challenges of low sample efficiency in reinforcement learning and the heavy reliance on extensive expert data with poor generalization in imitation learning for autonomous parking in narrow, complex environments. To overcome these limitations, the authors propose CIL-SERL, a sample-efficient reinforcement learning framework grounded in corrective experience. Inspired by human learning from mistakes, CIL-SERL introduces a novel multi-level replay buffer that structurally integrates standard trajectories, human interventions, failed explorations, and rollback-correction segments. The framework leverages a high-fidelity 3D Gaussian Splatting (3DGS) simulation environment to enable end-to-end training. Experimental results demonstrate that CIL-SERL substantially improves sample efficiency, policy robustness, and generalization, achieving higher parking success rates, operational efficiency, and safety in both simulated and real-world vehicle platforms.
📝 Abstract
Autonomous parking demands precise low-speed maneuvering within narrow, cluttered, and highly constrained environments, where vehicles must navigate tight spaces while avoiding static obstacles and complex geometric boundaries. Unlike imitation learning, which typically requires massive volumes of high-quality expert demonstrations to converge to a stable policy and often suffers from limited generalization to unseen scenarios, traditional reinforcement learning (RL) methods face persistent challenges including excessive training overhead, inefficient exploration, and even failure to learn viable parking strategies in challenging settings. To address these limitations, this paper presents a correction-in-the-loop sample-efficient reinforcement learning (CIL-SERL) framework for end-to-end autonomous parking, which is entirely trained in a photorealistic 3D Gaussian Splatting (3DGS) parking simulator that enables high-fidelity digital reconstruction of real-world scenes. Inspired by error-correction notebooks used in learning practice, we design a novel multi-level replay buffer mechanism. These buffers hierarchically organize and store standard RL rollouts, human corrective interventions, failed exploration trajectories, and rollback-based correction segments in separate yet interconnected memory regions, facilitating structured sampling and targeted learning during training. The proposed framework is systematically evaluated in both the 3DGS simulation environment and a physical vehicle platform. Extensive experimental results demonstrate that our method achieves substantial improvements in parking success rate, operational efficiency, and safety performance across diverse scenarios, validating the effectiveness and practical applicability of the proposed CIL-SERL-based end-to-end autonomous parking solution.