GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance

📅 2024-10-09
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Embodied agents struggle with understanding low-level physical dynamics, cross-task generalization, and zero-shot environmental adaptation—particularly in the absence of task-specific demonstrations or customized simulation environments. Method: This paper proposes a multi-role collaborative online embodied agent framework, introducing the novel “role division” paradigm. It integrates vision-language models (VLMs), embodied reasoning agents, real-time visuomotor closed-loop control, and a modular dialogue architecture to enable online semantic grounding of policies and dynamic recalibration of action distributions. Contribution/Results: The framework achieves cross-task, cross-environment, and cross-platform deployment without requiring new demonstrations, simulation retraining, or hardware adaptation. Evaluated on both simulated and real robotic platforms, it demonstrates significantly improved manipulation success rates, validating its strong robustness and zero-shot adaptability.

Technology Category

Application Category

📝 Abstract
Robot learning approaches such as behavior cloning and reinforcement learning have shown great promise in synthesizing robot skills from human demonstrations in specific environments. However, these approaches often require task-specific demonstrations or designing complex simulation environments, which limits the development of generalizable and robust policies for unseen real-world settings. Recent advances in the use of foundation models for robotics (e.g., LLMs, VLMs) have shown great potential in enabling systems to understand the semantics in the world from large-scale internet data. However, it remains an open challenge to use this knowledge to enable robotic systems to understand the underlying dynamics of the world, to generalize policies across different tasks, and to adapt policies to new environments. To alleviate these limitations, we propose an agentic framework for robot self-guidance and self-improvement, which consists of a set of role-specialized conversational agents, such as a high-level advisor, a grounding agent, a monitoring agent, and a robotic agent. Our framework iteratively grounds a base robot policy to relevant objects in the environment and uses visuomotor cues to shift the action distribution of the policy to more desirable states, online, while remaining agnostic to the subjective configuration of a given robot hardware platform. We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates, both in simulation and in real-world experiments, without the need for additional human demonstrations or extensive exploration. Code and videos available at: https://agenticrobots.github.io
Problem

Research questions and friction points this paper is trying to address.

Generalizing robot policies across different tasks
Adapting policies to new environments dynamically
Reducing reliance on task-specific human demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic framework with role-specialized conversational agents
Online visuomotor grounding for policy adaptation
Hardware-agnostic robot self-guidance and improvement
🔎 Similar Papers
No similar papers found.