IC-World: In-Context Generation for Shared World Modeling

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the challenge of geometric and motion inconsistency across views in multi-view video generation. To this end, we propose IC-World, the first framework to systematically explore video world models for shared world modeling. Methodologically, IC-World leverages the contextual generation capability of large-scale video foundation models: given multi-view static images as input, it concurrently synthesizes dynamic video sequences across all views. We further introduce a group-relative policy optimization–based reinforcement learning mechanism, coupled with a novel dual reward model—explicitly enforcing scene-level geometric consistency and object-level motion consistency—for end-to-end optimization. Experiments demonstrate that IC-World significantly outperforms state-of-the-art methods on both geometric and motion consistency metrics, enabling high-fidelity, cross-view coherent dynamic content generation.

Technology Category

Application Category

📝 Abstract

Video-based world models have recently garnered increasing attention for their ability to synthesize diverse and dynamic visual environments. In this paper, we focus on shared world modeling, where a model generates multiple videos from a set of input images, each representing the same underlying world in different camera poses. We propose IC-World, a novel generation framework, enabling parallel generation for all input images via activating the inherent in-context generation capability of large video models. We further finetune IC-World via reinforcement learning, Group Relative Policy Optimization, together with two proposed novel reward models to enforce scene-level geometry consistency and object-level motion consistency among the set of generated videos. Extensive experiments demonstrate that IC-World substantially outperforms state-of-the-art methods in both geometry and motion consistency. To the best of our knowledge, this is the first work to systematically explore the shared world modeling problem with video-based world models.

Problem

Research questions and friction points this paper is trying to address.

Generates multiple videos from shared input images

Enforces geometry and motion consistency across videos

Systematically explores shared world modeling with video models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Activates in-context generation in large video models

Uses reinforcement learning with Group Relative Policy Optimization

Employs novel reward models for geometry and motion consistency

🔎 Similar Papers

No similar papers found.