RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing vision-language action models struggle with execution drift in long-horizon, high-contact manipulation tasks due to their reliance solely on successful demonstrations, while discarding failure trajectories undermines robustness. This work proposes a recovery-driven policy optimization framework that, for the first time, systematically incorporates recovery trajectory modeling. By integrating Recovery-Aware Initialization (RAI), a Progress-Aware Semantic Value Function (PAS-VF), and Value-Conditioned Refinement (VCR), the approach transforms adverse states into corrective training signals and steers policy learning toward actions that maximize task progress. Notably, the method operates without online failure detection and significantly enhances disturbance resilience and self-recovery capability in both simulated and real-world bimanual tasks. In adversarial scenarios, success rates improve from 20% to an average of 75%, reaching up to 80% in physical experiments.

📝 Abstract

Vision-Language-Action (VLA) models remain brittle in long-horizon, contact-rich manipulation because success-only imitation provides little supervision for execution drift, while failed rollouts are often discarded. We introduce RePO-VLA, a recovery-driven policy optimization framework that assigns distinct roles to success, recovery, and failure trajectories. RePO-VLA first applies Recovery-Aware Initialization (RAI), slicing recovery segments and resetting history so corrective actions depend on the current adverse state rather than the preceding failure. It then learns a Progress-Aware Semantic Value Function (PAS-VF), aligning spatiotemporal trajectory features with instructions and successful references. The resulting labels salvage useful failure prefixes via reliability decay, while low-value labels mark drift and terminal breakdowns, teaching differences among nominal, failed, and corrective actions. The data engine turns adverse states into planner-generated or human-collected corrective rollouts, teaching recovery to the success manifold. Value-Conditioned Refinement (VCR) trains the policy to prefer high-progress actions. At deployment, a fixed high value ($v=1.0$) biases actions toward the learned success manifold without online failure detectors or heuristic retries. We introduce FRBench, with standardized error injection and recovery-focused evaluation. Across simulated and real-world bimanual tasks, RePO-VLA improves robustness, raising adversarial success from 20% to 75% on average and up to 80% in scaled real-world trials.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

execution drift

failure recovery

long-horizon manipulation

contact-rich tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recovery-Driven Policy Optimization

Vision-Language-Action Models

Progress-Aware Semantic Value Function

Recovery-Aware Initialization

Value-Conditioned Refinement

🔎 Similar Papers

VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation

2024-09-19Citations: 5

TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

2024-09-19IEEE Robotics and Automation LettersCitations: 42