Rational Decision-Making Agent with Internalized Utility Judgment

📅 2023-08-24

📈 Citations: 9

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing LLM-based decision-making methods rely on manually designed external performance metrics, which are often unavailable or unreliable in real-world scenarios. This work proposes RadAgent—the first rational decision-making agent capable of intrinsic utility assessment, eliminating dependence on external metrics and enabling autonomous evolution of rationality from posterior experience. Our core innovation is a novel Elo-based pairwise comparison mechanism for utility construction: it dynamically assigns Elo scores to each decision step, facilitating utility internalization. We further design an end-to-end iterative optimization framework that jointly integrates experience exploration, utility learning, and Elo scoring. Evaluated on ToolBench, RadAgent achieves over a 10% improvement in Pass Rate, delivers higher solution quality, and significantly reduces ChatGPT API invocation costs.

📝 Abstract

Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications. Existing approaches to LLM-based decision-making predominantly build upon the manually-designed external performance metrics to guide the decision-making process. However, reliance on the external performance metrics as prior is problematic in real-world scenarios, where such prior may be unavailable, flawed, or even erroneous. For genuine autonomous decision making, it is imperative for the agent to develop its rationality from its posterior experiences to judge decisions independently. Central to the development of rationality is the construction of an internalized utility judgment, capable of assigning numerical utilities to each decision. This paper proposes RadAgent (Rational Decision-Making Agent), which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning. Within this framework, Elo-based Utility Construction is devised to assign Elo scores to individual decision steps to judge their utilities via pairwise comparisons. Consequently, these Elo scores guide the decision-making process to derive optimal outcomes. Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks. It offers higher-quality solutions and reduces costs (ChatGPT API calls), highlighting its effectiveness and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Developing LLM-based agents for autonomous multi-step decision-making

Overcoming reliance on flawed external performance metrics

Internalizing utility judgment via experience-driven Elo score learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Internalized utility judgment for autonomous decisions

Elo-based Utility Construction for step evaluation

Iterative Experience Exploration and Utility Learning

🔎 Similar Papers

No similar papers found.