RoboStereo: Dual-Tower 4D Embodied World Models for Unified Policy Optimization

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of deploying embodied intelligence in real-world settings—namely high costs, safety risks, and limitations of existing world models that suffer from geometric hallucinations and lack a unified policy optimization framework. To overcome these issues, the authors propose a symmetric dual-tower 4D world model that enforces spatiotemporal geometric consistency through bidirectional cross-modal enhancement. They further introduce the first unified policy optimization framework, integrating test-time policy augmentation, imitation-evolution learning, and open-ended exploration mechanisms. This approach effectively suppresses physical hallucinations, enables multimodal policy learning, and achieves state-of-the-art generation quality, delivering an average performance improvement of over 97% on fine-grained manipulation tasks.

Technology Category

Application Category

📝 Abstract
Scalable Embodied AI faces fundamental constraints due to prohibitive costs and safety risks of real-world interaction. While Embodied World Models (EWMs) offer promise through imagined rollouts, existing approaches suffer from geometric hallucinations and lack unified optimization frameworks for practical policy improvement. We introduce RoboStereo, a symmetric dual-tower 4D world model that employs bidirectional cross-modal enhancement to ensure spatiotemporal geometric consistency and alleviate physics hallucinations. Building upon this high-fidelity 4D simulator, we present the first unified framework for world-model-based policy optimization: (1) Test-Time Policy Augmentation (TTPA) for pre-execution verification, (2) Imitative-Evolutionary Policy Learning (IEPL) leveraging visual perceptual rewards to learn from expert demonstrations, and (3) Open-Exploration Policy Learning (OEPL) enabling autonomous skill discovery and self-correction. Comprehensive experiments demonstrate RoboStereo achieves state-of-the-art generation quality, with our unified framework delivering >97% average relative improvement on fine-grained manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

Embodied AI
World Models
Geometric Hallucinations
Policy Optimization
4D Simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embodied World Models
4D Simulation
Policy Optimization
Cross-modal Enhancement
Geometric Consistency
🔎 Similar Papers
No similar papers found.