🤖 AI Summary
Autonomous urban parking faces significant challenges due to constrained spaces and the need for high-precision control, compounded by the absence of high-quality, structured datasets tailored for end-to-end learning. This work proposes ParkingScenes—the first structured, multimodal simulation dataset specifically designed for parking tasks—built on the CARLA platform. It leverages Hybrid A* path planning and an MPC controller to generate accurate, reproducible trajectories, while synchronously capturing multi-view RGB images, depth maps, vehicle states, and bird’s-eye-view (BEV) representations. The dataset enables context-aware and multimodal fusion learning, and policies trained on it consistently outperform those trained on unstructured, manually collected data under identical model architectures, thereby demonstrating the efficacy and necessity of structured supervision for learning robust parking strategies.
📝 Abstract
Autonomous parking remains a critical yet challenging task in intelligent driving systems, particularly within constrained urban environments where maneuvering space is limited and precise control is essential. While recent advances in end-to-end learning have shown great promise, the lack of high-quality, structured datasets tailored for parking scenarios remains a significant bottleneck.To address this gap, we present ParkingScenes, a comprehensive multimodal dataset specifically designed for end-to-end autonomous parking in simulated scenes. Built on the CARLA simulator, ParkingScenes features structured parking trajectories generated by a Hybrid A* planner and a Model Predictive Controller (MPC), providing accurate and reproducible supervision signals. The dataset includes 16 reverse-in and 6 parallel parking scenarios, each executed under two pedestrian conditions (present and absent), resulting in 704 structured episodes and approximately 105000 frames. Each scenario is repeated 16 times to ensure consistent coverage. Each frame contains synchronized data from four RGB cameras, four depth sensors, vehicle motion states, and Bird's-Eye View (BEV) representations, enabling rich multimodal fusion and context-aware learning. To demonstrate the utility of our dataset, we compare models trained on ParkingScenes with those trained on unstructured, manually collected simulation data under identical conditions. Results show significant improvements in performance, underscoring the effectiveness of structured supervision for robust and accurate parking policy learning. By releasing both the dataset and the collection framework, ParkingScenes establishes a scalable and reproducible benchmark for advancing learning-based autonomous parking systems. The dataset and collection framework will be released at: https://github.com/haonan-ai/ParkingScenes