DVDP: An End-to-End Policy for Mobile Robot Visual Docking with RGB-D Perception

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current visual docking methods for mobile robots suffer from sensitivity to initial pose and reliance on high-precision localization. To address this, we propose DVDP—a fully end-to-end visual docking framework that directly regresses physically feasible, smooth docking trajectories from stereo RGB-D images, without requiring prior pose estimation or hand-crafted modules. Our key contributions include: (1) a large-scale, photorealistic RGB-D docking dataset synthesized via Unity 3D simulation and validated on the SCOUT Mini real-world platform, enabling robust cross-domain learning; (2) a task-specific quantitative evaluation metric suite tailored for docking performance; and (3) a novel deep network architecture supporting domain-adaptive policy learning. Evaluated on physical hardware, DVDP achieves centimeter-level docking accuracy and strong robustness under varying lighting, occlusion, and initialization conditions—outperforming state-of-the-art methods in both precision and generalization.

Technology Category

Application Category

📝 Abstract
Automatic docking has long been a significant challenge in the field of mobile robotics. Compared to other automatic docking methods, visual docking methods offer higher precision and lower deployment costs, making them an efficient and promising choice for this task. However, visual docking methods impose strict requirements on the robot's initial position at the start of the docking process. To overcome the limitations of current vision-based methods, we propose an innovative end-to-end visual docking method named DVDP(direct visual docking policy). This approach requires only a binocular RGB-D camera installed on the mobile robot to directly output the robot's docking path, achieving end-to-end automatic docking. Furthermore, we have collected a large-scale dataset of mobile robot visual automatic docking dataset through a combination of virtual and real environments using the Unity 3D platform and actual mobile robot setups. We developed a series of evaluation metrics to quantify the performance of the end-to-end visual docking method. Extensive experiments, including benchmarks against leading perception backbones adapted into our framework, demonstrate that our method achieves superior performance. Finally, real-world deployment on the SCOUT Mini confirmed DVDP's efficacy, with our model generating smooth, feasible docking trajectories that meet physical constraints and reach the target pose.
Problem

Research questions and friction points this paper is trying to address.

Overcoming initial position constraints in visual docking
Developing end-to-end RGB-D perception for robot docking
Generating feasible docking trajectories under physical constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end visual docking with RGB-D
Combines virtual and real data collection
Generates feasible trajectories meeting constraints
🔎 Similar Papers
No similar papers found.
H
Haohan Min
Shenzhen International Graduate School, Tsinghua University
Z
Zhoujian Li
College of Design and Engineering, National University of Singapore
Y
Yu Yang
School of Electrical and Electronic Engineering, Nanyang Technological University
Jinyu Chen
Jinyu Chen
The Hong Kong Polytechnic University
Edge/cloud computingVideo transmission.
S
Shenghai Yuan
School of Electrical and Electronic Engineering, Nanyang Technological University