DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transparent objects pose significant challenges for long-horizon, high-precision robotic manipulation due to their complex optical properties—resulting in poor generalization, reliance on category-specific priors, and limited short-horizon visual observability. To address these issues, we propose a long-horizon transparent-object manipulation framework for single-arm eye-in-hand robots. Our approach comprises three core components: (1) robust perception via joint depth and 6D pose estimation; (2) a lightweight vision-language task planner that jointly interprets natural language instructions and one-shot demonstrations; and (3) a category-agnostic, training-free 6D trajectory generation method that generalizes from a single demonstration. Extensive experiments demonstrate that our framework significantly outperforms prior methods on long-horizon manipulation tasks with previously unseen transparent objects, achieving strong cross-object generalization, sub-millimeter manipulation accuracy, and practical deployability on real robotic systems.

Technology Category

Application Category

📝 Abstract
Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/
Problem

Research questions and friction points this paper is trying to address.

Addressing limited robotic manipulation of transparent objects
Enabling precise long-horizon tasks with single-demonstration learning
Generalizing manipulation to novel objects without category priors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates depth and 6D pose estimation
Uses single-demonstration for novel object generalization
Refines vision-language plans for precise manipulation
🔎 Similar Papers
No similar papers found.