🤖 AI Summary
Existing navigation systems exhibit insufficient localization accuracy—typically constrained to within 1 meter—for fine-grained tasks such as object docking, detection, and manipulation, failing to meet centimeter-level requirements. This work proposes a local navigation method for estimating the full six-degree-of-freedom (6-DoF) relative pose of arbitrary objects, explicitly formulating local navigation as a high-precision pose estimation problem for the first time. The method integrates multimodal perception, a high-accuracy motion prediction network, and is trained on large-scale photorealistic simulation data, enabling strong sim-to-real transfer. It supports zero- or few-shot adaptation to novel objects and heterogeneous robotic platforms. Experiments demonstrate that the system achieves an average positional error of <3 cm and angular error of <3° in real-world scenarios, generalizing effectively to unseen objects and diverse robot configurations without fine-tuning.
📝 Abstract
Existing navigation systems mostly consider"success"when the robot reaches within 1m radius to a goal. This precision is insufficient for emerging applications where the robot needs to be positioned precisely relative to an object for downstream tasks, such as docking, inspection, and manipulation. To this end, we design and implement Aim-My-Robot (AMR), a local navigation system that enables a robot to reach any object in its vicinity at the desired relative pose, with centimeter-level precision. AMR achieves high precision and robustness by leveraging multi-modal perception, precise action prediction, and is trained on large-scale photorealistic data generated in simulation. AMR shows strong sim2real transfer and can adapt to different robot kinematics and unseen objects with little to no fine-tuning.