End-to-End Visual Autonomous Parking via Control-Aided Attention

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing end-to-end parking systems suffer from insufficient coupling between perception and control, leading to spatiotemporal instability in self-attention mechanisms and degrading long-term policy reliability. To address this, we propose Control-Augmented Attention (CAA), the first attention mechanism trained via self-supervision using gradients of control outputs—thereby guiding visual focus toward regions most influential for action generation. Integrated within a Transformer-based backbone, CAA is further enhanced by a short-horizon waypoint prediction auxiliary task and a dedicated motion prediction module, all trained under an imitation learning framework for end-to-end vision-based autonomous parking. Evaluated in CARLA simulation, our approach significantly outperforms both end-to-end baselines and modular BEV-based pipelines coupled with Hybrid A* planning, achieving substantial improvements in parking success rate, localization accuracy, and system stability—while also demonstrating superior generalization and interpretability.

Technology Category

Application Category

📝 Abstract
Precise parking requires an end-to-end system where perception adaptively provides policy-relevant details-especially in critical areas where fine control decisions are essential. End-to-end learning offers a unified framework by directly mapping sensor inputs to control actions, but existing approaches lack effective synergy between perception and control. We find that transformer-based self-attention, when used alone, tends to produce unstable and temporally inconsistent spatial attention, which undermines the reliability of downstream policy decisions over time. Instead, we propose CAA-Policy, an end-to-end imitation learning system that allows control signal to guide the learning of visual attention via a novel Control-Aided Attention (CAA) mechanism. For the first time, we train such an attention module in a self-supervised manner, using backpropagated gradients from the control outputs instead of from the training loss. This strategy encourages the attention to focus on visual features that induce high variance in action outputs, rather than merely minimizing the training loss-a shift we demonstrate leads to a more robust and generalizable policy. To further enhance stability, CAA-Policy integrates short-horizon waypoint prediction as an auxiliary task, and introduces a separately trained motion prediction module to robustly track the target spot over time. Extensive experiments in the CARLA simulator show that itlevariable~consistently surpasses both the end-to-end learning baseline and the modular BEV segmentation + hybrid A* pipeline, achieving superior accuracy, robustness, and interpretability. Code is released at https://github.com/Joechencc/CAAPolicy.
Problem

Research questions and friction points this paper is trying to address.

Enhancing perception-control synergy in autonomous parking systems
Stabilizing spatial attention for reliable policy decisions
Improving parking accuracy and robustness via control-guided attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Control-Aided Attention mechanism guides perception
Self-supervised training using control output gradients
Auxiliary waypoint prediction enhances policy stability
🔎 Similar Papers
No similar papers found.