🤖 AI Summary
This work addresses the challenge that large language model (LLM) agents in sequential decision-making often fail due to hallucination, while relying exclusively on high-performance LLMs incurs prohibitive computational costs. To mitigate this trade-off, the authors propose ReDAct, a dual-LLM framework that dynamically schedules reasoning by adaptively invoking a large LLM only when necessary. The decision is guided by uncertainty estimates from a smaller, more efficient model, combined with calibrated thresholds and conditional inference strategies. Evaluated on text-based embodied environments such as ALFWorld and MiniGrid, ReDAct achieves task success rates comparable to those of using the large LLM at every step, yet activates the large model in only 15% of decision steps, substantially reducing computational overhead.
📝 Abstract
Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems. However, they inherit the tendency of LLMs to hallucinate, leading to incorrect decisions. In sequential settings, even a single mistake can irreversibly degrade the trajectory, making hallucinations an even bigger problem. Although larger LLMs hallucinate less, they incur a significantly higher per-token cost. In this paper, we address this tradeoff by proposing ReDAct (Reason-Defer-Act). In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model. When the predictive uncertainty of the small model exceeds a calibrated threshold, the decision is deferred to the large model. We evaluate our approach in text-based embodied environments such as ALFWorld and MiniGrid and show that deferring only about 15% of decisions to the large model can match the quality of using it exclusively, while significantly reducing inference costs.