Feedback-Induced Performance Decline in LLM-Based Decision-Making

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work investigates the zero-shot autonomous decision-making capabilities of large language models (LLMs) in Markov decision processes (MDPs), probing their fundamental limits. Methodologically, it evaluates LLMs on increasingly complex MDP tasks—ranging from simple state-action mappings to multi-step sequential planning—and introduces online structured feedback to assess robustness to dynamic environmental signals. Results reveal that while LLMs achieve moderate performance on basic MDPs, they exhibit severe deficits in long-horizon planning, temporal credit assignment, and state-consistent reasoning. Crucially, online feedback degrades rather than improves performance, exposing a novel “feedback-induced confusion” phenomenon: LLMs misinterpret or overreact to structured feedback due to insufficient internal state maintenance. Systematic comparison with classical RL methods confirms that prompt engineering alone cannot overcome these architectural limitations. The study thus advocates hybrid architectures integrating fine-tuning, external memory, and modular decision components to enhance reliability—providing both theoretical insight and practical guidance for LLM-based autonomous agent design.

Technology Category

Application Category

📝 Abstract

The ability of Large Language Models (LLMs) to extract context from natural language problem descriptions naturally raises questions about their suitability in autonomous decision-making settings. This paper studies the behaviour of these models within a Markov Decision Process (MDPs). While traditional reinforcement learning (RL) strategies commonly employed in this setting rely on iterative exploration, LLMs, pre-trained on diverse datasets, offer the capability to leverage prior knowledge for faster adaptation. We investigate online structured prompting strategies in sequential decision making tasks, comparing the zero-shot performance of LLM-based approaches to that of classical RL methods. Our findings reveal that although LLMs demonstrate improved initial performance in simpler environments, they struggle with planning and reasoning in complex scenarios without fine-tuning or additional guidance. Our results show that feedback mechanisms, intended to improve decision-making, often introduce confusion, leading to diminished performance in intricate environments. These insights underscore the need for further exploration into hybrid strategies, fine-tuning, and advanced memory integration to enhance LLM-based decision-making capabilities.

Problem

Research questions and friction points this paper is trying to address.

LLMs' suitability in autonomous decision-making contexts

Performance decline due to feedback-induced confusion

Challenges in complex planning without fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online structured prompting for sequential decisions

Comparing LLM zero-shot to classical RL

Hybrid strategies for complex environments

🔎 Similar Papers

Ranking Generated Answers: On the Agreement of Retrieval Models with Humans on Consumer Health Questions