MIND: Benchmarking Memory Consistency and Action Control in World Models

📅 2026-02-08

📈 Citations: 1

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Existing world models lack a unified open-domain closed-loop benchmark, making it difficult to systematically evaluate their memory consistency and action control capabilities. To address this gap, this work proposes MIND, a benchmark comprising 250 high-resolution (1080p/24 FPS) multi-view synchronized video sequences that span a diverse action space—including variations in movement speed and camera rotation—and introduces a closed-loop interactive evaluation framework. Additionally, we present MIND-World, the first Video-to-World baseline method designed for open-domain scenarios. Experimental results demonstrate that current models still face significant challenges in long-term memory stability and generalization across actions, highlighting MIND as a reliable platform for future research in world modeling.

Technology Category

Application Category

📝 Abstract

World models aim to understand, remember, and predict dynamic visual environments, yet a unified benchmark for evaluating their fundamental abilities remains lacking. To address this gap, we introduce MIND, the first open-domain closed-loop revisited benchmark for evaluating Memory consIstency and action coNtrol in worlD models. MIND contains 250 high-quality videos at 1080p and 24 FPS, including 100 (first-person) + 100 (third-person) video clips under a shared action space and 25 + 25 clips across varied action spaces covering eight diverse scenes. We design an efficient evaluation framework to measure two core abilities: memory consistency and action control, capturing temporal stability and contextual coherence across viewpoints. Furthermore, we design various action spaces, including different character movement speeds and camera rotation angles, to evaluate the action generalization capability across different action spaces under shared scenes. To facilitate future performance benchmarking on MIND, we introduce MIND-World, a novel interactive Video-to-World baseline. Extensive experiments demonstrate the completeness of MIND and reveal key challenges in current world models, including the difficulty of maintaining long-term memory consistency and generalizing across action spaces. Code: https://github.com/CSU-JPG/MIND.

Problem

Research questions and friction points this paper is trying to address.

world models

memory consistency

action control

benchmarking

temporal stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

world models

memory consistency

action control