GRAPE: Generalizing Robot Policy via Preference Alignment

📅 2024-11-28

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

189K/year

🤖 AI Summary

VLA models suffer from poor generalization in robotic manipulation and distributional shift due to overreliance on expert-success demonstrations. To address this, we propose a trajectory-level preference alignment framework: (1) we introduce failure-trajectory-driven implicit reward modeling—marking the first effort to leverage failed demonstrations for reward learning, overcoming the limitation of success-only supervision; and (2) we design a customizable, multi-stage spatiotemporal constraint mechanism that integrates VLM-guided keypoint detection to flexibly align safety, efficiency, and task success. Evaluated in both real-world and simulated settings, our method achieves substantial improvements over SOTA: +51.79% success rate on in-domain tasks, +58.20% on zero-shot unseen tasks, −37.44% collision rate, and −11.15% average trajectory length.

Technology Category

Application Category

📝 Abstract

Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-language model. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 58.20%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 37.44% and rollout step-length by 11.15%, respectively. All code, models, and data are available at https://grape-vla.github.io/

Problem

Research questions and friction points this paper is trying to address.

Improves robot task generalization

Aligns model with safety and efficiency

Breaks complex tasks into manageable stages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns VLAs on trajectory level

Models reward from trials

Breaks down tasks to stages

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey