Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous robotic manipulation struggles to recover from failures, hindered by the high cost of real-world data, the sim-to-real gap, and the absence of executable trajectory-level recovery strategies. This work proposes Dream2Fix, a novel framework that leverages a generative world model to synthesize counterfactual failure-recovery trajectory pairs directly from real-world successful demonstrations, eliminating reliance on simulators. To ensure physical plausibility, the synthesized data undergoes structured validation based on task validity, visual consistency, and kinematic safety. A vision-language model is then fine-tuned to enable end-to-end mapping from visual anomalies to precise recovery actions. Using a high-fidelity dataset comprising over 120,000 samples, the approach boosts real-robot recovery accuracy from 19.7% to 81.3%, achieving zero-shot closed-loop failure recovery for the first time.

Technology Category

Application Category

📝 Abstract
While recent foundation models have significantly advanced robotic manipulation, these systems still struggle to autonomously recover from execution errors. Current failure-learning paradigms rely on either costly and unsafe real-world data collection or simulator-based perturbations, which introduce a severe sim-to-real gap. Furthermore, existing visual analyzers predominantly output coarse, binary diagnoses rather than the executable, trajectory-level corrections required for actual recovery. To bridge the gap between failure diagnosis and actionable recovery, we introduce Dream2Fix, a framework that synthesizes photorealistic, counterfactual failure rollouts directly from successful real-world demonstrations. By perturbing actions within a generative world model, Dream2Fix creates paired failure-correction data without relying on simulators. To ensure the generated data is physically viable for robot learning, we implement a structured verification mechanism that strictly filters rollouts for task validity, visual coherence, and kinematic safety. This engine produces a high-fidelity dataset of over 120k paired samples. Using this dataset, we fine-tune a vision-language model to jointly predict failure types and precise recovery trajectories, mapping visual anomalies directly to corrective actions. Extensive real-world robotic experiments show our approach achieves state-of-the-art correction accuracy, improving from 19.7% to 81.3% over prior baselines, and successfully enables zero-shot closed-loop failure recovery in physical deployments.
Problem

Research questions and friction points this paper is trying to address.

manipulation recovery
failure diagnosis
sim-to-real gap
trajectory-level correction
robotic error recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual failure synthesis
generative world model
actionable recovery
structured verification
vision-language model
🔎 Similar Papers
No similar papers found.
D
Dayou Li
Texas A&M University
J
Jiuzhou Lei
Texas A&M University
H
Hao Wang
Texas A&M University
L
Lulin Liu
Texas A&M University, University of Minnesota
Yunhao Yang
Yunhao Yang
University of Texas at Austin
Formal methodsAutonomyPrivacy
Z
Zihan Wang
Abaka AI
B
Bangya Liu
University of Wisconsin-Madison
Minghui Zheng
Minghui Zheng
J. Mike Walker '66 Department of Mechanical Engineering, Texas A&M University
RoboticsPlanningControlRobotic DisassemblyRemanufacturing Automation
Z
Zhiwen Fan
Texas A&M University