Policy Contrastive Decoding for Robotic Foundation Models

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Robotics foundation models often learn spurious visual-action correlations from pretraining trajectories, severely limiting cross-domain generalization. To address this, we propose Policy Contrastive Decoding (PCD), a zero-shot, training-free post-hoc method that enhances causal reasoning by masking salient objects and contrasting the resulting action probability distributions before and after perturbation—thereby steering policy attention toward genuine causal cues. PCD requires no model fine-tuning, architectural modification, weight access, or additional training; it is fully plug-and-play and compatible with both autoregressive (e.g., OpenVLA) and diffusion-based (e.g., Octo, π₀) policies. Extensive experiments in simulation and on real robots demonstrate substantial improvements in generalization robustness: PCD boosts the performance of the π₀ policy by 8% in simulation and 108% on physical hardware. These results validate PCD’s effectiveness as a universal, lightweight enhancement for diverse open-source robotic policies.

Technology Category

Application Category

📝 Abstract

Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $pi_0$ by 8% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.

Problem

Research questions and friction points this paper is trying to address.

Robotic policies learn spurious correlations from pre-training data

Improving generalization by focusing on object-relevant visual clues

Enhancing robot policies without finetuning or accessing model weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrasting action distributions from original and masked inputs

Training-free plugin for various robot policies

Improves generalization by focusing on object-relevant clues

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey