PALM: Enhanced Generalizability for Local Visuomotor Policies via Perception Alignment

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the poor generalization of image-based behavioral cloning in out-of-distribution scenarios—such as workspace shifts, viewpoint changes, and cross-embodiment transfer—by proposing a modular policy architecture that decomposes manipulation policies into coarse-grained global components and fine-grained local strategies. By incorporating visual attention mechanisms at the local policy level and conditioning them on proprioceptive representations, the method achieves invariance in local action distributions, enabling robust handling of diverse domain shifts without requiring additional modalities, architectural modifications, or new training data. Experiments demonstrate that the approach incurs only 8% and 24% performance degradation in simulation and real-world out-of-distribution settings, respectively, substantially outperforming baseline methods, which suffer degradations of 45% and 77%.

Technology Category

Application Category

📝 Abstract

Generalizing beyond the training domain in image-based behavior cloning remains challenging. Existing methods address individual axes of generalization, workspace shifts, viewpoint changes, and cross-embodiment transfer, yet they are typically developed in isolation and often rely on complex pipelines. We introduce PALM (Perception Alignment for Local Manipulation), which leverages the invariance of local action distributions between out-of-distribution (OOD) and demonstrated domains to address these OOD shifts concurrently, without additional input modalities, model changes, or data collection. PALM modularizes the manipulation policy into coarse global components and a local policy for fine-grained actions. We reduce the discrepancy between in-domain and OOD inputs at the local policy level by enforcing local visual focus and consistent proprioceptive representation, allowing the policy to retrieve invariant local actions under OOD conditions. Experiments show that PALM limits OOD performance drops to 8% in simulation and 24% in the real world, compared to 45% and 77% for baselines.

Problem

Research questions and friction points this paper is trying to address.

generalization

out-of-distribution

visuomotor policy

behavior cloning

domain shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

Perception Alignment

Local Visuomotor Policy

Out-of-Distribution Generalization

Behavior Cloning

Modular Policy

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15