VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenges of evaluating and modeling exploration-driven decision-making in AI agents for dynamic virtual escape rooms. We introduce VisEscape, a novel benchmark comprising 20 dynamically evolving escape rooms, which establishes the first systematic evaluation paradigm for exploration-driven decision-making. Methodologically, we propose VisEscaper—a unified framework integrating multimodal perception, external memory storage, environment feedback–guided replanning, and a ReAct-style reasoning–action loop—to enable active construction of spatiotemporal knowledge and self-correcting action execution. Experiments demonstrate that VisEscaper achieves a 5.0× improvement in average escape efficiency and a 3.7× increase in task completion rate over state-of-the-art multimodal models. This work provides both a rigorous new benchmark and a scalable architectural foundation for exploration-aware planning in embodied intelligence.

Technology Category

Application Category

📝 Abstract

Escape rooms present a unique cognitive challenge that demands exploration-driven planning: players should actively search their environment, continuously update their knowledge based on new discoveries, and connect disparate clues to determine which elements are relevant to their objectives. Motivated by this, we introduce VisEscape, a benchmark of 20 virtual escape rooms specifically designed to evaluate AI models under these challenging conditions, where success depends not only on solving isolated puzzles but also on iteratively constructing and refining spatial-temporal knowledge of a dynamically changing environment. On VisEscape, we observed that even state-of-the-art multimodal models generally fail to escape the rooms, showing considerable variation in their levels of progress and trajectories. To address this issue, we propose VisEscaper, which effectively integrates Memory, Feedback, and ReAct modules, demonstrating significant improvements by performing 3.7 times more effectively and 5.0 times more efficiently on average.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI in exploration-driven decision-making

Assessing spatial-temporal knowledge in dynamic environments

Improving AI performance in virtual escape rooms

Innovation

Methods, ideas, or system contributions that make the work stand out.

VisEscape benchmark for AI evaluation

Integrates Memory, Feedback, ReAct modules

Improves efficiency and effectiveness significantly

🔎 Similar Papers

No similar papers found.