Executable Analytic Concepts as the Missing Link Between VLM Insight and Precise Manipulation

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the semantic–physical gap between high-level visual-language understanding and low-level robotic manipulation, enabling zero-shot generalization to novel tasks in unstructured environments without task-specific training. Method: We propose Executable Abstraction Concepts (EACs), a unified mathematical formalism that jointly encodes object functionality, geometric constraints, and operational semantics—thereby establishing an interpretable, end-to-end mapping from language instructions to executable robot actions. Our framework integrates vision-language models (VLMs), natural language processing (NLP), geometric reasoning, and motion planning to generate grasp poses, force directions, and dynamically feasible trajectories. Contribution/Results: Evaluated in both simulation and real-world settings, our approach achieves strong zero-shot generalization across diverse articulated objects under complex natural-language commands, significantly improving manipulation accuracy and environmental adaptability compared to prior methods.

Technology Category

Application Category

📝 Abstract
Enabling robots to perform precise and generalized manipulation in unstructured environments remains a fundamental challenge in embodied AI. While Vision-Language Models (VLMs) have demonstrated remarkable capabilities in semantic reasoning and task planning, a significant gap persists between their high-level understanding and the precise physical execution required for real-world manipulation. To bridge this "semantic-to-physical" gap, we introduce GRACE, a novel framework that grounds VLM-based reasoning through executable analytic concepts (EAC)-mathematically defined blueprints that encode object affordances, geometric constraints, and semantics of manipulation. Our approach integrates a structured policy scaffolding pipeline that turn natural language instructions and visual information into an instantiated EAC, from which we derive grasp poses, force directions and plan physically feasible motion trajectory for robot execution. GRACE thus provides a unified and interpretable interface between high-level instruction understanding and low-level robot control, effectively enabling precise and generalizable manipulation through semantic-physical grounding. Extensive experiments demonstrate that GRACE achieves strong zero-shot generalization across a variety of articulated objects in both simulated and real-world environments, without requiring task-specific training.
Problem

Research questions and friction points this paper is trying to address.

Bridging semantic reasoning to precise physical robot execution
Enabling zero-shot generalization for manipulation without task-specific training
Translating natural language instructions into executable geometric constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

GRACE framework bridges semantic reasoning with physical execution
Executable analytic concepts encode object affordances and constraints
Structured policy scaffolding converts instructions into robot trajectories
🔎 Similar Papers
No similar papers found.
M
Mingyang Sun
Shanghai Innovation Institute, Westlake University
J
Jiude Wei
Shanghai Jiao Tong University
Q
Qichen He
Shanghai Innovation Institute, Shanghai Jiao Tong University
D
Donglin Wang
Westlake University
C
Cewu Lu
Shanghai Innovation Institute, Shanghai Jiao Tong University
Jianhua Sun
Jianhua Sun
Shanghai Jiao Tong University
Computer VisionRobot Learning