Memory-Driven Self-Improvement for Decision Making with Large Language Models

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the challenges of domain adaptation and poor few-shot generalization of large language models (LLMs) in sequential decision-making tasks, this paper proposes a memory-driven self-optimization framework. The method establishes a bidirectional enhancement mechanism between LLMs’ prior knowledge and a compact experience memory—comprising interaction trajectories and Q-values—enabling continual decision capability evolution via value-aware policy iteration, dynamic experience replay, and closed-loop trajectory optimization. Its core innovation lies in the organic integration of symbolic reasoning with numerical value estimation, yielding an interpretable and evolvable decision-learning paradigm. Evaluated on the ALFWorld benchmark, the approach achieves over 40% improvement in in-distribution task accuracy and over 75% gain in cross-task zero-shot generalization, significantly outperforming both pure reinforcement learning and state-of-the-art LLM-based baselines.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40% on in-distribution tasks and over 75% when generalized to unseen tasks in ALFWorld.

Problem

Research questions and friction points this paper is trying to address.

Adapting LLMs to specific decision tasks with limited data

Combining general knowledge with domain-specific memory experiences

Enabling mutual improvement between memory and LLM prior

Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-driven framework combines LLM knowledge with experiences

Compact memory stores past interactions and Q-values

Mutual reinforcement between memory and LLM enables self-improvement

🔎 Similar Papers

Efficient Sequential Decision Making with Large Language Models