A Benchmark for Interactive World Models with a Unified Action Generation Framework

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the lack of large-scale datasets and standardized evaluation benchmarks for assessing physical interaction capabilities in world models. To this end, we introduce iWorld-Bench, the first standardized benchmark specifically designed for evaluating interactive reasoning, comprising 330,000 video clips, 2,100 high-quality samples, and 4,900 test instances across six task categories. We also propose a unified cross-modal action generation framework to support consistent model evaluation. Using this benchmark, we conduct a systematic assessment of 14 representative world models, uncovering critical limitations in their abilities related to distance perception, memory retention, visual generation, and trajectory following. The benchmark, evaluation results, and a public leaderboard are released to foster further research in interactive world modeling.

📝 Abstract

Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their physical interaction capabilities. To address this, we propose iWorld-Bench, a comprehensive benchmark for training and testing world models on interaction-related abilities such as distance perception and memory. We construct a diverse dataset with 330k video clips and select 2.1k high-quality samples covering varied perspectives, weather, and scenes. As existing world models differ in interaction modalities, we introduce an Action Generation Framework to unify evaluation and design six task types, generating 4.9k test samples. These tasks jointly assess model performance across visual generation, trajectory following, and memory. Evaluating 14 representative world models, we identify key limitations and provide insights for future research. The iWorld-Bench model leaderboard is publicly available at iWorld-Bench.com.

Problem

Research questions and friction points this paper is trying to address.

interactive world models

benchmark

physical interaction

action generation

AGI

Innovation

Methods, ideas, or system contributions that make the work stand out.

interactive world models

unified action generation framework

large-scale benchmark