Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

๐Ÿ“… 2025-10-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current AI agents excel at static reasoning tasks (e.g., mathematical problem solving and code generation) but struggle in dynamic, interactive environments requiring long-horizon planningโ€”such as web navigation and mobile UI manipulation. To address this, we propose *alternative trial-and-error*, a cognitive capability enabling agents to mentally simulate multiple future action trajectories before execution, thereby enhancing environmental understanding and planning robustness. We introduce Dyna-Mind, a two-stage training framework: (1) an experience-grounded mental simulation stage leveraging Reasoning with Simulations (ReSim) to construct search trees from real interaction data; and (2) an online reinforcement learning stage using Dyna-GRPO, which jointly optimizes policies via outcome rewards and intermediate state feedback. Evaluated on Sokoban, ALFWorld, and AndroidWorld benchmarks, our method significantly improves performance on long-horizon, high-planning-demand tasks. This work is the first to systematically integrate virtual trial-and-error into agent architectures, empirically validating the critical role of mental simulation in interactive intelligence.

Technology Category

Application Category

๐Ÿ“ Abstract
Reasoning models have recently shown remarkable progress in domains such as math and coding. However, their expert-level abilities in math and coding contrast sharply with their performance in long-horizon, interactive tasks such as web navigation and computer/phone-use. Inspired by literature on human cognition, we argue that current AI agents need''vicarious trial and error''- the capacity to mentally simulate alternative futures before acting - in order to enhance their understanding and performance in complex interactive environments. We introduce Dyna-Mind, a two-stage training framework that explicitly teaches (V)LM agents to integrate such simulation into their reasoning. In stage 1, we introduce Reasoning with Simulations (ReSim), which trains the agent to generate structured reasoning traces from expanded search trees built from real experience gathered through environment interactions. ReSim thus grounds the agent's reasoning in faithful world dynamics and equips it with the ability to anticipate future states in its reasoning. In stage 2, we propose Dyna-GRPO, an online reinforcement learning method to further strengthen the agent's simulation and decision-making ability by using both outcome rewards and intermediate states as feedback from real rollouts. Experiments on two synthetic benchmarks (Sokoban and ALFWorld) and one realistic benchmark (AndroidWorld) demonstrate that (1) ReSim effectively infuses simulation ability into AI agents, and (2) Dyna-GRPO leverages outcome and interaction-level signals to learn better policies for long-horizon, planning-intensive tasks. Together, these results highlight the central role of simulation in enabling AI agents to reason, plan, and act more effectively in the ever more challenging environments.
Problem

Research questions and friction points this paper is trying to address.

Enhancing AI agents' performance in long-horizon interactive tasks
Teaching AI agents to mentally simulate alternative futures before acting
Improving reasoning and planning in complex interactive environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains agents to generate reasoning traces from experience
Uses online reinforcement learning with outcome rewards
Enhances simulation ability for long-horizon interactive tasks
๐Ÿ”Ž Similar Papers
No similar papers found.