Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

📅 2025-11-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pretrained vision-language-action (VLA) models often suffer significant performance degradation in real-world robot deployment due to distribution shift, while existing fine-tuning approaches demand extensive demonstration data and substantial computational resources—limiting practicality. This paper introduces VLA-Pilot: a plug-and-play, inference-time policy guidance framework that requires neither fine-tuning nor additional data. Its core is an embodied evolutionary diffusion mechanism, which, during inference, jointly optimizes action sequences via iterative evolutionary search and diffusion-based priors conditioned on visual–linguistic context, enabling closed-loop control. To our knowledge, VLA-Pilot is the first method to achieve zero-shot generalization across diverse manipulation tasks and heterogeneous robot morphologies. Evaluated on six real-world robotic manipulation tasks, it substantially improves success rates of pretrained VLAs, demonstrating strong robustness and adaptability both in-distribution and out-of-distribution.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action (VLA) models have demonstrated significant potential in real-world robotic manipulation. However, pre-trained VLA policies still suffer from substantial performance degradation during downstream deployment. Although fine-tuning can mitigate this issue, its reliance on costly demonstration collection and intensive computation makes it impractical in real-world settings. In this work, we introduce VLA-Pilot, a plug-and-play inference-time policy steering method for zero-shot deployment of pre-trained VLA without any additional fine-tuning or data collection. We evaluate VLA-Pilot on six real-world downstream manipulation tasks across two distinct robotic embodiments, encompassing both in-distribution and out-of-distribution scenarios. Experimental results demonstrate that VLA-Pilot substantially boosts the success rates of off-the-shelf pre-trained VLA policies, enabling robust zero-shot generalization to diverse tasks and embodiments. Experimental videos and code are available at: https://rip4kobe.github.io/vla-pilot/.
Problem

Research questions and friction points this paper is trying to address.

Enables zero-shot deployment of pre-trained VLA models
Eliminates need for fine-tuning or demonstration collection
Improves robotic manipulation success rates across diverse tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play inference-time policy steering method
Deploys pre-trained VLA without fine-tuning
Uses embodied evolutionary diffusion for zero-shot generalization
🔎 Similar Papers
No similar papers found.
Z
Zhuo Li
Department of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong
J
Junjia Liu
Department of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong
Z
Zhipeng Dong
Department of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong
Tao Teng
Tao Teng
Istituto Italiano di Tecnologia (IIT)
Robotics
Quentin Rouxel
Quentin Rouxel
CUHK
RoboticHumanoid RobotsMulti-ContactWhole-Body ControlImitation Learning
D
Darwin Caldwell
Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
F
Fei Chen
Department of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong