Policy Contrastive Decoding for Robotic Foundation Models

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robotics foundation models often learn spurious visual-action correlations from pretraining trajectories, severely limiting cross-domain generalization. To address this, we propose Policy Contrastive Decoding (PCD), a zero-shot, training-free post-hoc method that enhances causal reasoning by masking salient objects and contrasting the resulting action probability distributions before and after perturbation—thereby steering policy attention toward genuine causal cues. PCD requires no model fine-tuning, architectural modification, weight access, or additional training; it is fully plug-and-play and compatible with both autoregressive (e.g., OpenVLA) and diffusion-based (e.g., Octo, π₀) policies. Extensive experiments in simulation and on real robots demonstrate substantial improvements in generalization robustness: PCD boosts the performance of the π₀ policy by 8% in simulation and 108% on physical hardware. These results validate PCD’s effectiveness as a universal, lightweight enhancement for diverse open-source robotic policies.

Technology Category

Application Category

📝 Abstract
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $pi_0$ by 8% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
Problem

Research questions and friction points this paper is trying to address.

Robotic policies learn spurious correlations from pre-training data
Improving generalization by focusing on object-relevant visual clues
Enhancing robot policies without finetuning or accessing model weights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrasting action distributions from original and masked inputs
Training-free plugin for various robot policies
Improves generalization by focusing on object-relevant clues
🔎 Similar Papers
No similar papers found.