Grounded Task Axes: Zero-Shot Semantic Skill Generalization via Task-Axis Controllers and Visual Foundation Models

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

In open-world robotic manipulation, cross-object skill transfer suffers from a fundamental tension between high-level structural disparities and low-level control consistency. To address this, we propose a semantics-aligned zero-shot skill generalization framework that decouples skills into priority-task-axis controllers anchored on object keypoints and geometric axes. We introduce the Grounded Task-Axis (GTA) controller paradigm—the first of its kind—enabling structured skill decomposition and semantic-level zero-shot transfer. Furthermore, we pioneer the use of a vision foundation model (SD-DINO) to drive cross-object keypoint semantic alignment and controller redeployment. By integrating keypoint-axis geometric grounding with example-guided transfer, our method successfully executes diverse real-robot tasks—including screw tightening, liquid pouring, and scraping—on previously unseen objects. Experiments demonstrate substantial improvements in robustness to novel objects and generalization across manipulation tasks.

Technology Category

Application Category

📝 Abstract

Transferring skills between different objects remains one of the core challenges of open-world robot manipulation. Generalization needs to take into account the high-level structural differences between distinct objects while still maintaining similar low-level interaction control. In this paper, we propose an example-based zero-shot approach to skill transfer. Rather than treating skills as atomic, we decompose skills into a prioritized list of grounded task-axis (GTA) controllers. Each GTAC defines an adaptable controller, such as a position or force controller, along an axis. Importantly, the GTACs are grounded in object key points and axes, e.g., the relative position of a screw head or the axis of its shaft. Zero-shot transfer is thus achieved by finding semantically-similar grounding features on novel target objects. We achieve this example-based grounding of the skills through the use of foundation models, such as SD-DINO, that can detect semantically similar keypoints of objects. We evaluate our framework on real-robot experiments, including screwing, pouring, and spatula scraping tasks, and demonstrate robust and versatile controller transfer for each.

Problem

Research questions and friction points this paper is trying to address.

Transferring robot skills between different objects

Decomposing skills into adaptable task-axis controllers

Using foundation models for zero-shot semantic skill transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose skills into grounded task-axis controllers

Use foundation models for semantic keypoint detection

Achieve zero-shot transfer via semantically-similar features

🔎 Similar Papers

No similar papers found.