ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models (e.g., GPT-4o) lack human-like online reflection, error correction, and autonomous exploration capabilities for complex, long-horizon web tasks. To address this, we propose Reflective-MCTS—a novel framework integrating contrastive reflection, multi-agent debate, and test-time exploratory learning—enabling efficient feedback of search knowledge into model parameters for the first time. It internalizes search strategies without external algorithms and supports scalable optimization during both training and inference. On VisualWebArena, our method outperforms state-of-the-art approaches by 6–30%. Fine-tuning GPT-4o achieves 87% of the original R-MCTS performance with significantly reduced computational overhead, demonstrating feasibility for lightweight deployment. Our core contribution is a new paradigm for vision-language agents featuring backtracking reasoning, collaborative evaluation, and online evolution.

Technology Category

Application Category

📝 Abstract
Autonomous agents have demonstrated significant potential in automating complex multistep decision-making tasks. However, even state-of-the-art vision-language models (VLMs), such as GPT-4o, still fall short of human-level performance, particularly in intricate web environments and long-horizon tasks. To address these limitations, we present ExACT, an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly. R-MCTS extends traditional MCTS by 1) incorporating contrastive reflection, allowing agents to learn from past interactions and dynamically improve their search efficiency; and 2) using multi-agent debate for reliable state evaluation. Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms. On the challenging VisualWebArena benchmark, our GPT-4o based R-MCTS agent achieves a 6% to 30% relative improvement across various tasks compared to the previous state-of-the-art. Additionally, we show that the knowledge and experience gained from test-time search can be effectively transferred back to GPT-4o via fine-tuning. After Exploratory Learning, GPT-4o 1) demonstrates the ability to explore the environment, evaluate a state, and backtrack to viable ones when it detects that the current state cannot lead to success, and 2) matches 87% of R-MCTS's performance while using significantly less compute. Notably, our work demonstrates the compute scaling properties in both training - data collection with R-MCTS - and testing time. These results suggest a promising research direction to enhance VLMs' capabilities for agentic applications via test-time search and self-learning.
Problem

Research questions and friction points this paper is trying to address.

Complex Task Handling
Long-term Reasoning
Adaptive Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

R-MCTS
Exploratory Learning
ExACT Method
🔎 Similar Papers
No similar papers found.
X
Xiao Yu
Columbia University, NY
Baolin Peng
Baolin Peng
Microsoft Research, Redmond
NLPDialogFoundation ModelsAlignment
V
Vineeth Vajipey
Columbia University, NY
H
Hao Cheng
Microsoft Research, Redmond
Michel Galley
Michel Galley
Sr. Principal Research Manager at Microsoft
natural language processingdeep learningmachine learning
J
Jianfeng Gao
Microsoft Research, Redmond
Z
Zhou Yu
Columbia University, NY