🤖 AI Summary
This work addresses the limitations of existing video world models, which are predominantly confined to single-agent scenarios and struggle to capture multi-agent interactions and cross-view consistency. The authors propose a unified multi-agent, multi-view video world model that generates future frames conditioned on actions, enabling precise control over multiple agents and coherent modeling of observations across viewpoints. Key innovations include a multi-agent conditioning module and a global state encoder, which together support flexible scaling in the number of agents and views, enable parallel multi-view synthesis, and ensure cross-view consistency. Evaluated on multi-player gaming and multi-robot manipulation tasks, the method significantly outperforms current baselines in terms of video fidelity, action responsiveness, and multi-view consistency.
📝 Abstract
Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or agents. They are modeled as action-conditioned video generation models that take historical frames and current actions as input to predict future frames. Yet, most existing approaches are limited to single-agent scenarios and fail to capture the complex interactions inherent in real-world multi-agent systems. We present \textbf{MultiWorld}, a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. We introduce the Multi-Agent Condition Module to achieve precise multi-agent controllability, and the Global State Encoder to ensure coherent observations across different views. MultiWorld supports flexible scaling of agent and view counts, and synthesizes different views in parallel for high efficiency. Experiments on multi-player game environments and multi-robot manipulation tasks demonstrate that MultiWorld outperforms baselines in video fidelity, action-following ability, and multi-view consistency. Project page: https://multi-world.github.io/