Rational Decision-Making Agent with Internalized Utility Judgment

πŸ“… 2023-08-24
πŸ“ˆ Citations: 9
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing LLM-based decision-making methods rely on manually designed external performance metrics, which are often unavailable or unreliable in real-world scenarios. This work proposes RadAgentβ€”the first rational decision-making agent capable of intrinsic utility assessment, eliminating dependence on external metrics and enabling autonomous evolution of rationality from posterior experience. Our core innovation is a novel Elo-based pairwise comparison mechanism for utility construction: it dynamically assigns Elo scores to each decision step, facilitating utility internalization. We further design an end-to-end iterative optimization framework that jointly integrates experience exploration, utility learning, and Elo scoring. Evaluated on ToolBench, RadAgent achieves over a 10% improvement in Pass Rate, delivers higher solution quality, and significantly reduces ChatGPT API invocation costs.
πŸ“ Abstract
Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications. Existing approaches to LLM-based decision-making predominantly build upon the manually-designed external performance metrics to guide the decision-making process. However, reliance on the external performance metrics as prior is problematic in real-world scenarios, where such prior may be unavailable, flawed, or even erroneous. For genuine autonomous decision making, it is imperative for the agent to develop its rationality from its posterior experiences to judge decisions independently. Central to the development of rationality is the construction of an internalized utility judgment, capable of assigning numerical utilities to each decision. This paper proposes RadAgent (Rational Decision-Making Agent), which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning. Within this framework, Elo-based Utility Construction is devised to assign Elo scores to individual decision steps to judge their utilities via pairwise comparisons. Consequently, these Elo scores guide the decision-making process to derive optimal outcomes. Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks. It offers higher-quality solutions and reduces costs (ChatGPT API calls), highlighting its effectiveness and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Developing LLM-based agents for autonomous multi-step decision-making
Overcoming reliance on flawed external performance metrics
Internalizing utility judgment via experience-driven Elo score learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Internalized utility judgment for autonomous decisions
Elo-based Utility Construction for step evaluation
Iterative Experience Exploration and Utility Learning
πŸ”Ž Similar Papers
No similar papers found.