WorldCompass: Reinforcement Learning for Long-Horizon World Models

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient accuracy and temporal consistency of long-horizon interactive video world models during exploration by proposing a reinforcement learning-based post-training framework. The approach introduces a clip-level rollout strategy, a complementary reward function that jointly optimizes interaction accuracy and visual quality, and an RL algorithm integrating negative perceptual fine-tuning with efficient reward modeling. Experiments on the WorldPlay model demonstrate that the proposed method significantly improves the accuracy of interactive responses and the visual fidelity of generated videos, effectively mitigates reward hacking, and enhances temporal coherence in long-duration video generation.

Technology Category

Application Category

📝 Abstract
This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world more accurately and consistently based on interaction signals. To effectively"steer"the world model's exploration, we introduce three core innovations tailored to the autoregressive video generation paradigm: 1) Clip-level rollout Strategy: We generate and evaluate multiple samples at a single target clip, which significantly boosts rollout efficiency and provides fine-grained reward signals. 2) Complementary Reward Functions: We design reward functions for both interaction-following accuracy and visual quality, which provide direct supervision and effectively suppress reward-hacking behaviors. 3) Efficient RL Algorithm: We employ the negative-aware fine-tuning strategy coupled with various efficiency optimizations to efficiently and effectively enhance model capacity. Evaluations on the SoTA open-source world model, WorldPlay, demonstrate that WorldCompass significantly improves interaction accuracy and visual fidelity across various scenarios.
Problem

Research questions and friction points this paper is trying to address.

long-horizon world models
interactive video generation
reinforcement learning
world model exploration
autoregressive video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
World Models
Clip-level Rollout
Complementary Reward Functions
Negative-aware Fine-tuning
🔎 Similar Papers
No similar papers found.