Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

📅 2024-06-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Replacing visual encoders in visuomotor policies often causes a sharp drop in generalization due to tight coupling between perception and motor control. Method: This paper proposes the “perception splicing” paradigm for zero-shot cross-encoder policy transfer. It decouples perceptual representation from motor skills via cross-policy latent visual feature alignment—enabling modular, plug-and-play visual encoders—and introduces a cross-condition policy embedding disentanglement mechanism, allowing downstream motor policies to reuse pretrained decoders without fine-tuning. Contribution/Results: Our approach is the first to support arbitrary pretrained visual encoders in zero-shot compositional transfer. Evaluated across diverse simulated and real-world manipulation tasks, it achieves 100% zero-shot transfer success rate; all baseline methods fail completely. This breaks the long-standing generalization bottleneck imposed by visuomotor coupling, establishing a new paradigm for flexible, encoder-agnostic robotic policy deployment.

Technology Category

Application Category

📝 Abstract
Vision-based imitation learning has shown promising capabilities of endowing robots with various motion skills given visual observation. However, current visuomotor policies fail to adapt to drastic changes in their visual observations. We present Perception Stitching that enables strong zero-shot adaptation to large visual changes by directly stitching novel combinations of visual encoders. Our key idea is to enforce modularity of visual encoders by aligning the latent visual features among different visuomotor policies. Our method disentangles the perceptual knowledge with the downstream motion skills and allows the reuse of the visual encoders by directly stitching them to a policy network trained with partially different visual conditions. We evaluate our method in various simulated and real-world manipulation tasks. While baseline methods failed at all attempts, our method could achieve zero-shot success in real-world visuomotor tasks. Our quantitative and qualitative analysis of the learned features of the policy network provides more insights into the high performance of our proposed method.
Problem

Research questions and friction points this paper is trying to address.

Visual Motion Robotics
Adaptive Skill Transfer
Perception Encoder
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perception Stitching
Adaptive Robot Policies
Transfer Learning
🔎 Similar Papers
No similar papers found.