Efficient Training of Generalizable Visuomotor Policies via Control-Aware Augmentation

📅 2024-01-17

📈 Citations: 1

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the challenges of poor cross-environment generalization of vision–motor policies in embodied intelligence, high cost of acquiring large-scale labeled data, and structural degradation of task-relevant features caused by global data augmentation, this paper proposes EAGLE. Methodologically, EAGLE introduces (1) control-aware, mask-guided local augmentation, enabled by self-supervised identification of image regions critical for motor control; and (2) a lightweight vision–motor policy knowledge distillation mechanism that enables zero-shot, fine-tuning-free transfer. Evaluated on the DMControl generalization benchmark, an enhanced robotic perturbation benchmark, and a long-horizon drawer-opening task, EAGLE achieves significant improvements over state-of-the-art methods: it boosts average generalization performance by 23% and accelerates training convergence by 1.8×.

Technology Category

Application Category

📝 Abstract

Improving generalization is one key challenge in embodied AI, where obtaining large-scale datasets across diverse scenarios is costly. Traditional weak augmentations, such as cropping and flipping, are insufficient for improving a model's performance in new environments. Existing data augmentation methods often disrupt task-relevant information in images, potentially degrading performance. To overcome these challenges, we introduce EAGLE, an efficient training framework for generalizable visuomotor policies that improves upon existing methods by (1) enhancing generalization by applying augmentation only to control-related regions identified through a self-supervised control-aware mask and (2) improving training stability and efficiency by distilling knowledge from an expert to a visuomotor student policy, which is then deployed to unseen environments without further fine-tuning. Comprehensive experiments on three domains, including the DMControl Generalization Benchmark, the enhanced Robot Manipulation Distraction Benchmark, and a long-sequential drawer-opening task, validate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Improving generalization in visuomotor policies for diverse environments

Preserving task-relevant information during data augmentation

Enhancing training efficiency without fine-tuning in unseen environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Control-aware mask for targeted augmentation

Knowledge distillation from expert policy

No fine-tuning needed for new environments

🔎 Similar Papers

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids