IC-World: In-Context Generation for Shared World Modeling

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of geometric and motion inconsistency across views in multi-view video generation. To this end, we propose IC-World, the first framework to systematically explore video world models for shared world modeling. Methodologically, IC-World leverages the contextual generation capability of large-scale video foundation models: given multi-view static images as input, it concurrently synthesizes dynamic video sequences across all views. We further introduce a group-relative policy optimization–based reinforcement learning mechanism, coupled with a novel dual reward model—explicitly enforcing scene-level geometric consistency and object-level motion consistency—for end-to-end optimization. Experiments demonstrate that IC-World significantly outperforms state-of-the-art methods on both geometric and motion consistency metrics, enabling high-fidelity, cross-view coherent dynamic content generation.

Technology Category

Application Category

📝 Abstract
Video-based world models have recently garnered increasing attention for their ability to synthesize diverse and dynamic visual environments. In this paper, we focus on shared world modeling, where a model generates multiple videos from a set of input images, each representing the same underlying world in different camera poses. We propose IC-World, a novel generation framework, enabling parallel generation for all input images via activating the inherent in-context generation capability of large video models. We further finetune IC-World via reinforcement learning, Group Relative Policy Optimization, together with two proposed novel reward models to enforce scene-level geometry consistency and object-level motion consistency among the set of generated videos. Extensive experiments demonstrate that IC-World substantially outperforms state-of-the-art methods in both geometry and motion consistency. To the best of our knowledge, this is the first work to systematically explore the shared world modeling problem with video-based world models.
Problem

Research questions and friction points this paper is trying to address.

Generates multiple videos from shared input images
Enforces geometry and motion consistency across videos
Systematically explores shared world modeling with video models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Activates in-context generation in large video models
Uses reinforcement learning with Group Relative Policy Optimization
Employs novel reward models for geometry and motion consistency
🔎 Similar Papers
No similar papers found.
F
Fan Wu
Nanyang Technological University
J
Jiacheng Wei
Nanyang Technological University
Ruibo Li
Ruibo Li
Nanyang Technological University
Y
Yi Xu
Goertek Alpha Labs
J
Junyou Li
Tencent
Deheng Ye
Deheng Ye
Director of AI, Tencent
Applied machine learning
Guosheng Lin
Guosheng Lin
Nanyang Technological University
Computer VisionMachine Learning