$pi^{*}_{0.6}$: a VLA That Learns From Experience

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of continual self-adaptation for vision-language-action (VLA) models in real-world deployment. We propose RECAP, an advantage-conditioned policy framework that unifies offline demonstration data, online robot interaction data, and expert teleoperation interventions via offline pretraining followed by online closed-loop optimization—enabling multi-source, heterogeneous data-driven policy refinement. Its core innovation lies in modeling the policy distribution conditioned on the advantage function, thereby enhancing robustness to task dynamics and environmental perturbations. Evaluated in real household environments, RECAP successfully executes complex, long-horizon tasks—including clothing folding, cardboard box assembly, and professional coffee machine operation—achieving over a 2.1× throughput improvement and a 48% reduction in failure rate on the most challenging tasks.

Technology Category

Application Category

📝 Abstract

We study how vision-language-action (VLA) models can improve through real-world deployments via reinforcement learning (RL). We present a general-purpose method, RL with Experience and Corrections via Advantage-conditioned Policies (RECAP), that provides for RL training of VLAs via advantage conditioning. Our method incorporates heterogeneous data into the self-improvement process, including demonstrations, data from on-policy collection, and expert teleoperated interventions provided during autonomous execution. RECAP starts by pre-training a generalist VLA with offline RL, which we call $pi^{*}_{0.6}$, that can then be specialized to attain high performance on downstream tasks through on-robot data collection. We show that the $pi^{*}_{0.6}$ model trained with the full RECAP method can fold laundry in real homes, reliably assemble boxes, and make espresso drinks using a professional espresso machine. On some of the hardest tasks, RECAP more than doubles task throughput and roughly halves the task failure rate.

Problem

Research questions and friction points this paper is trying to address.

Improving vision-language-action models through real-world reinforcement learning deployments

Incorporating heterogeneous data sources for VLA self-improvement process

Enhancing robot task performance on complex real-world manipulation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language-action model learns via reinforcement learning

RECAP method incorporates heterogeneous data sources

Pre-trained VLA specializes through on-robot data collection

🔎 Similar Papers

No similar papers found.