🤖 AI Summary
Transparent objects pose significant challenges for long-horizon, high-precision robotic manipulation due to their complex optical properties—resulting in poor generalization, reliance on category-specific priors, and limited short-horizon visual observability. To address these issues, we propose a long-horizon transparent-object manipulation framework for single-arm eye-in-hand robots. Our approach comprises three core components: (1) robust perception via joint depth and 6D pose estimation; (2) a lightweight vision-language task planner that jointly interprets natural language instructions and one-shot demonstrations; and (3) a category-agnostic, training-free 6D trajectory generation method that generalizes from a single demonstration. Extensive experiments demonstrate that our framework significantly outperforms prior methods on long-horizon manipulation tasks with previously unseen transparent objects, achieving strong cross-object generalization, sub-millimeter manipulation accuracy, and practical deployability on real robotic systems.
📝 Abstract
Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/