Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Autonomous robotic manipulation struggles to recover from failures, hindered by the high cost of real-world data, the sim-to-real gap, and the absence of executable trajectory-level recovery strategies. This work proposes Dream2Fix, a novel framework that leverages a generative world model to synthesize counterfactual failure-recovery trajectory pairs directly from real-world successful demonstrations, eliminating reliance on simulators. To ensure physical plausibility, the synthesized data undergoes structured validation based on task validity, visual consistency, and kinematic safety. A vision-language model is then fine-tuned to enable end-to-end mapping from visual anomalies to precise recovery actions. Using a high-fidelity dataset comprising over 120,000 samples, the approach boosts real-robot recovery accuracy from 19.7% to 81.3%, achieving zero-shot closed-loop failure recovery for the first time.

Technology Category

Application Category

📝 Abstract

While recent foundation models have significantly advanced robotic manipulation, these systems still struggle to autonomously recover from execution errors. Current failure-learning paradigms rely on either costly and unsafe real-world data collection or simulator-based perturbations, which introduce a severe sim-to-real gap. Furthermore, existing visual analyzers predominantly output coarse, binary diagnoses rather than the executable, trajectory-level corrections required for actual recovery. To bridge the gap between failure diagnosis and actionable recovery, we introduce Dream2Fix, a framework that synthesizes photorealistic, counterfactual failure rollouts directly from successful real-world demonstrations. By perturbing actions within a generative world model, Dream2Fix creates paired failure-correction data without relying on simulators. To ensure the generated data is physically viable for robot learning, we implement a structured verification mechanism that strictly filters rollouts for task validity, visual coherence, and kinematic safety. This engine produces a high-fidelity dataset of over 120k paired samples. Using this dataset, we fine-tune a vision-language model to jointly predict failure types and precise recovery trajectories, mapping visual anomalies directly to corrective actions. Extensive real-world robotic experiments show our approach achieves state-of-the-art correction accuracy, improving from 19.7% to 81.3% over prior baselines, and successfully enables zero-shot closed-loop failure recovery in physical deployments.

Problem

Research questions and friction points this paper is trying to address.

manipulation recovery

failure diagnosis

sim-to-real gap

trajectory-level correction

robotic error recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual failure synthesis

generative world model

actionable recovery