Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses a critical limitation in current vision-language-action (VLA) models, which rely solely on behavior cloning from successful demonstrations and thus struggle to recover from execution errors, often failing due to minor deviations. To overcome this, the authors propose an adaptive failure-aware learning framework that, for the first time, incorporates online-generated failure trajectories as adaptive negative supervision signals into end-to-end VLA training. The approach employs a dual-action generator architecture sharing a common vision-language backbone to jointly model successful and failed behaviors, coupled with a distribution-distance-driven sampling mechanism that dynamically avoids error-prone regions. Requiring no manually designed failure modes or external recovery interventions, the method achieves substantially improved policy robustness with minimal parameter overhead, outperforming existing VLA baselines in both in-domain and out-of-domain settings across short- and long-horizon manipulation tasks.

📝 Abstract

Vision-language-action (VLA) models provide a promising paradigm for scalable robotic manipulation, yet their reliance on success-only behavioral cloning leaves them brittle; lacking corrective training signals, minor execution errors rapidly compound into unrecoverable, out-of-distribution failures. To address this limitation, we propose Adaptive Failure-Informed Learning (AFIL), an end-to-end framework that leverages failure trajectories as adaptive negative guidance for diffusion- and flow-based VLA policies. AFIL uses a pretrained VLA to generate failure rollouts online, avoiding the need for handcrafted failure-mode design or human-in-the-loop recovery. It then jointly trains Dual Action Generators (DAGs) for successful and failed behaviors while sharing a common vision-language backbone, enabling efficient failure-aware policy learning with limited parameter overhead. During sampling, the failure generator adaptively steers action generation away from failure-prone regions and toward more reliable success modes, with guidance strength determined by the per-diffusion-step distance between success and failure distributions. Experiments across in-domain and out-of-domain robotic manipulation tasks, covering both short- and long-horizon settings, show that AFIL consistently improves task success rates and robustness over existing VLA baselines, demonstrating its effectiveness, efficiency, and generality.

Problem

Research questions and friction points this paper is trying to address.

vision-language-action models

behavioral cloning

failure recovery

distribution shift

robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Failure-Informed Learning

Vision-Language-Action Models

Failure Trajectories