ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models

📅 2026-01-18

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the limitations of current video-based embodied world models, which often prioritize visual fidelity at the expense of physical plausibility, dynamic consistency, and task-level reasoning—critical shortcomings for downstream tasks involving contact-rich manipulation. To overcome this, we propose the first multidimensional reward modeling framework tailored for embodied world models. Our approach integrates a hierarchical reward model with a flow-based world model architecture and leverages a large-scale video preference dataset (approximately 235K samples) alongside an efficient PPO-style alignment algorithm. This enables joint optimization during post-training across four key dimensions: physical fidelity, task completion, embodied plausibility, and visual quality. Experimental results demonstrate that our method significantly outperforms existing approaches across multiple evaluation metrics.

Technology Category

Application Category

📝 Abstract

Recently, video-based world models that learn to simulate the dynamics have gained increasing attention in robot learning. However, current approaches primarily emphasize visual generative quality while overlooking physical fidelity, dynamic consistency, and task logic, especially for contact-rich manipulation tasks, which limits their applicability to downstream tasks. To this end, we introduce ReWorld, a framework aimed to employ reinforcement learning to align the video-based embodied world models with physical realism, task completion capability, embodiment plausibility and visual quality. Specifically, we first construct a large-scale (~235K) video preference dataset and employ it to train a hierarchical reward model designed to capture multi-dimensional reward consistent with human preferences. We further propose a practical alignment algorithm that post-trains flow-based world models using this reward through a computationally efficient PPO-style algorithm. Comprehensive experiments and theoretical analysis demonstrate that ReWorld significantly improves the physical fidelity, logical coherence, embodiment and visual quality of generated rollouts, outperforming previous methods.

Problem

Research questions and friction points this paper is trying to address.

world models

physical fidelity

dynamic consistency

task logic

embodied AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward modeling

world models

embodied AI