DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Transparent objects pose significant challenges for long-horizon, high-precision robotic manipulation due to their complex optical properties—resulting in poor generalization, reliance on category-specific priors, and limited short-horizon visual observability. To address these issues, we propose a long-horizon transparent-object manipulation framework for single-arm eye-in-hand robots. Our approach comprises three core components: (1) robust perception via joint depth and 6D pose estimation; (2) a lightweight vision-language task planner that jointly interprets natural language instructions and one-shot demonstrations; and (3) a category-agnostic, training-free 6D trajectory generation method that generalizes from a single demonstration. Extensive experiments demonstrate that our framework significantly outperforms prior methods on long-horizon manipulation tasks with previously unseen transparent objects, achieving strong cross-object generalization, sub-millimeter manipulation accuracy, and practical deployability on real robotic systems.

Technology Category

Application Category

📝 Abstract

Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/

Problem

Research questions and friction points this paper is trying to address.

Addressing limited robotic manipulation of transparent objects

Enabling precise long-horizon tasks with single-demonstration learning

Generalizing manipulation to novel objects without category priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates depth and 6D pose estimation

Uses single-demonstration for novel object generalization

Refines vision-language plans for precise manipulation

🔎 Similar Papers

No similar papers found.

Amazon

193,300.00 - 261,500.00 USD annually

USA, CA, San Francisco

Promotion (PhD): KI-basierte Lernstrategien für Smart Manufacturing im europäischen HORIZON-Projekt

Bosch Group

ARENA2036 in Stuttgart

Research Scientist Intern, Robotic Control Policy (PhD)