π€ AI Summary
This work addresses the challenge of reasoning about counterfactual spatial questions in real-world settings where active exploration is infeasible due to physical constraints or safety concerns. To this end, the authors propose WanderDream, the first large-scale dataset for mental exploration, which leverages a world model to enable embodied agents to simulate trajectories from their current viewpoint to hypothetical target situations entirely βin the mind,β without physical movement. The dataset is constructed by synthesizing panoramic videos and associated spatial question-answer pairs, supporting both trajectory generation and cross-scene transfer evaluation in real environments. Experimental results demonstrate that mental exploration substantially enhances the embodied reasoning capabilities of multimodal large language models in real-world scenarios, validating the effectiveness and generalizability of the proposed approach.
π Abstract
Situated reasoning often relies on active exploration, yet in many real-world scenarios such exploration is infeasible due to physical constraints of robots or safety concerns of visually impaired users. Given only a limited observation, can an agent mentally simulate a future trajectory toward a target situation and answer spatial what-if questions? We introduce WanderDream, the first large-scale dataset designed for the emulative simulation of mental exploration, enabling models to reason without active exploration. WanderDream-Gen comprises 15.8K panoramic videos across 1,088 real scenes from HM3D, ScanNet++, and real-world captures, depicting imagined trajectories from current viewpoints to target situations. WanderDream-QA contains 158K question-answer pairs, covering starting states, paths, and end states along each trajectory to comprehensively evaluate exploration-based reasoning. Extensive experiments with world models and MLLMs demonstrate (1) that mental exploration is essential for situated reasoning, (2) that world models achieve compelling performance on WanderDream-Gen, (3) that imagination substantially facilitates reasoning on WanderDream-QA, and (4) that WanderDream data exhibit remarkable transferability to real-world scenarios. The source code and all data will be released.