🤖 AI Summary
Current visual docking methods for mobile robots suffer from sensitivity to initial pose and reliance on high-precision localization. To address this, we propose DVDP—a fully end-to-end visual docking framework that directly regresses physically feasible, smooth docking trajectories from stereo RGB-D images, without requiring prior pose estimation or hand-crafted modules. Our key contributions include: (1) a large-scale, photorealistic RGB-D docking dataset synthesized via Unity 3D simulation and validated on the SCOUT Mini real-world platform, enabling robust cross-domain learning; (2) a task-specific quantitative evaluation metric suite tailored for docking performance; and (3) a novel deep network architecture supporting domain-adaptive policy learning. Evaluated on physical hardware, DVDP achieves centimeter-level docking accuracy and strong robustness under varying lighting, occlusion, and initialization conditions—outperforming state-of-the-art methods in both precision and generalization.
📝 Abstract
Automatic docking has long been a significant challenge in the field of mobile robotics. Compared to other automatic docking methods, visual docking methods offer higher precision and lower deployment costs, making them an efficient and promising choice for this task. However, visual docking methods impose strict requirements on the robot's initial position at the start of the docking process. To overcome the limitations of current vision-based methods, we propose an innovative end-to-end visual docking method named DVDP(direct visual docking policy). This approach requires only a binocular RGB-D camera installed on the mobile robot to directly output the robot's docking path, achieving end-to-end automatic docking. Furthermore, we have collected a large-scale dataset of mobile robot visual automatic docking dataset through a combination of virtual and real environments using the Unity 3D platform and actual mobile robot setups. We developed a series of evaluation metrics to quantify the performance of the end-to-end visual docking method. Extensive experiments, including benchmarks against leading perception backbones adapted into our framework, demonstrate that our method achieves superior performance. Finally, real-world deployment on the SCOUT Mini confirmed DVDP's efficacy, with our model generating smooth, feasible docking trajectories that meet physical constraints and reach the target pose.