A Benchmark for Interactive World Models with a Unified Action Generation Framework

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
This work addresses the lack of large-scale datasets and standardized evaluation benchmarks for assessing physical interaction capabilities in world models. To this end, we introduce iWorld-Bench, the first standardized benchmark specifically designed for evaluating interactive reasoning, comprising 330,000 video clips, 2,100 high-quality samples, and 4,900 test instances across six task categories. We also propose a unified cross-modal action generation framework to support consistent model evaluation. Using this benchmark, we conduct a systematic assessment of 14 representative world models, uncovering critical limitations in their abilities related to distance perception, memory retention, visual generation, and trajectory following. The benchmark, evaluation results, and a public leaderboard are released to foster further research in interactive world modeling.
📝 Abstract
Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their physical interaction capabilities. To address this, we propose iWorld-Bench, a comprehensive benchmark for training and testing world models on interaction-related abilities such as distance perception and memory. We construct a diverse dataset with 330k video clips and select 2.1k high-quality samples covering varied perspectives, weather, and scenes. As existing world models differ in interaction modalities, we introduce an Action Generation Framework to unify evaluation and design six task types, generating 4.9k test samples. These tasks jointly assess model performance across visual generation, trajectory following, and memory. Evaluating 14 representative world models, we identify key limitations and provide insights for future research. The iWorld-Bench model leaderboard is publicly available at iWorld-Bench.com.
Problem

Research questions and friction points this paper is trying to address.

interactive world models
benchmark
physical interaction
action generation
AGI
Innovation

Methods, ideas, or system contributions that make the work stand out.

interactive world models
unified action generation framework
large-scale benchmark
physical interaction evaluation
memory and perception tasks
🔎 Similar Papers
No similar papers found.