Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of real-time control in visual navigation posed by diffusion-based and Schrödinger bridge–derived generative policies, which typically require multi-step integration. The authors propose the Rectified Schrödinger Bridge Matching (RSBM) framework, which unifies the conditional velocity field structures of maximum-entropy Schrödinger bridges and deterministic optimal transport. They establish the functional form invariance of these velocity fields across varying entropy regularization strengths and prove that reducing the entropy parameter linearly decreases velocity variance, thereby enhancing ODE integration stability. By incorporating a learned conditional prior to shorten transport distances, RSBM achieves 94% cosine similarity and 92% task success rate with only three integration steps—significantly outperforming conventional methods requiring at least ten steps—without resorting to distillation or multi-stage training.
📝 Abstract
Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schrödinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing a critical barrier for real-time robotic control. We propose Rectified Schrödinger Bridge Matching (RSBM), a framework that exploits a shared velocity-field structure between standard Schrödinger Bridges ($\varepsilon=1$, maximum-entropy transport) and deterministic Optimal Transport ($\varepsilon\to 0$, as in Conditional Flow Matching), controlled by a single entropic regularization parameter $\varepsilon$. We prove two key results: (1) the conditional velocity field's functional form is invariant across the entire $\varepsilon$-spectrum (Velocity Structure Invariance), enabling a single network to serve all regularization strengths; and (2) reducing $\varepsilon$ linearly decreases the conditional velocity variance, enabling more stable coarse-step ODE integration. Anchored to a learned conditional prior that shortens transport distance, RSBM operates at an intermediate $\varepsilon$ that balances multimodal coverage and path straightness. Empirically, while standard bridges require $\geq 10$ steps to converge, RSBM achieves over 94% cosine similarity and 92% success rate in merely 3 integration steps -- without distillation or multi-stage training -- substantially narrowing the gap between high-fidelity generative policies and the low-latency demands of Embodied AI.
Problem

Research questions and friction points this paper is trying to address.

Visual Navigation
Schrödinger Bridge
Few-Step Generation
Embodied AI
Real-Time Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Schrödinger Bridge
Conditional Flow Matching
Velocity Structure Invariance
Entropic Regularization
Few-Step ODE Integration
🔎 Similar Papers
No similar papers found.
W
Wuyang Luan
School of Mathematics, Jilin University
J
Junhui Li
College of Computer Science, Chongqing University
Weiguang Zhao
Weiguang Zhao
Univeristy of Liverpool, PhD Candidate
3D VisionEmbodied AIOpen World
W
Wenjian Zhang
GenY
T
Tieru Wu
School of Mathematics, Jilin University
Rui Ma
Rui Ma
Associate Professor at Jilin University
computer graphicscomputer visiongeometry modelingshape analysiscontent creation