CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language navigation (VLN) models frequently deviate from correct trajectories and lack robust online error correction capabilities. To address this, we propose the “Self-Correction Flywheel” mechanism—a novel paradigm that, for the first time, treats erroneous trajectories generated during training as valuable supervisory signals. It automatically detects deviations, synthesizes perception-action-level self-correction data, and refines the model through iterative training cycles, establishing an end-to-end learnable correction loop—eliminating reliance on manually annotated correction data. Evaluated on R2R-CE and RxR-CE benchmarks, our method achieves success rates of 65.1% and 69.3%, surpassing prior state-of-the-art models by 8.2% and 16.4%, respectively. Furthermore, real-world robotic deployment demonstrates strong robustness to long-horizon instructions and dynamic environmental perturbations.

Technology Category

Application Category

📝 Abstract
Existing vision-and-language navigation models often deviate from the correct trajectory when executing instructions. However, these models lack effective error correction capability, hindering their recovery from errors. To address this challenge, we propose Self-correction Flywheel, a novel post-training paradigm. Instead of considering the model's error trajectories on the training set as a drawback, our paradigm emphasizes their significance as a valuable data source. We have developed a method to identify deviations in these error trajectories and devised innovative techniques to automatically generate self-correction data for perception and action. These self-correction data serve as fuel to power the model's continued training. The brilliance of our paradigm is revealed when we re-evaluate the model on the training set, uncovering new error trajectories. At this time, the self-correction flywheel begins to spin. Through multiple flywheel iterations, we progressively enhance our monocular RGB-based VLA navigation model CorrectNav. Experiments on R2R-CE and RxR-CE benchmarks show CorrectNav achieves new state-of-the-art success rates of 65.1% and 69.3%, surpassing prior best VLA navigation models by 8.2% and 16.4%. Real robot tests in various indoor and outdoor environments demonstrate method's superior capability of error correction, dynamic obstacle avoidance, and long instruction following.
Problem

Research questions and friction points this paper is trying to address.

Existing navigation models lack error correction capability
Self-correction Flywheel uses error trajectories as training data
Enhances model performance in dynamic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-correction Flywheel for error recovery
Automatic generation of self-correction data
Iterative enhancement of VLA navigation model
🔎 Similar Papers
No similar papers found.
Z
Zhuoyuan Yu
CFCS, School of Computer Science, Peking University
Yuxing Long
Yuxing Long
Peking University
Embodied Intelligence
Z
Zihan Yang
CFCS, School of Computer Science, Peking University
C
Chengyan Zeng
PKU-Agibot Lab
Hongwei Fan
Hongwei Fan
Peking University
Robotics3D Vision
Jiyao Zhang
Jiyao Zhang
Peking University
Embodied AIRobotics3D Vision
H
Hao Dong
CFCS, School of Computer Science, Peking University