Memory-Driven Self-Improvement for Decision Making with Large Language Models

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of domain adaptation and poor few-shot generalization of large language models (LLMs) in sequential decision-making tasks, this paper proposes a memory-driven self-optimization framework. The method establishes a bidirectional enhancement mechanism between LLMs’ prior knowledge and a compact experience memory—comprising interaction trajectories and Q-values—enabling continual decision capability evolution via value-aware policy iteration, dynamic experience replay, and closed-loop trajectory optimization. Its core innovation lies in the organic integration of symbolic reasoning with numerical value estimation, yielding an interpretable and evolvable decision-learning paradigm. Evaluated on the ALFWorld benchmark, the approach achieves over 40% improvement in in-distribution task accuracy and over 75% gain in cross-task zero-shot generalization, significantly outperforming both pure reinforcement learning and state-of-the-art LLM-based baselines.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40% on in-distribution tasks and over 75% when generalized to unseen tasks in ALFWorld.
Problem

Research questions and friction points this paper is trying to address.

Adapting LLMs to specific decision tasks with limited data
Combining general knowledge with domain-specific memory experiences
Enabling mutual improvement between memory and LLM prior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-driven framework combines LLM knowledge with experiences
Compact memory stores past interactions and Q-values
Mutual reinforcement between memory and LLM enables self-improvement
🔎 Similar Papers
No similar papers found.
Xue Yan
Xue Yan
Ph.d. student, Institute of Automation,Chinese Academy of Sciences
Machine Learning
Zijing Ou
Zijing Ou
Imperial College London
machine learning
Mengyue Yang
Mengyue Yang
Lecturer, University of Bristol
CausalityTrustworthiness
Y
Yan Song
AI Centre, Department of Computer Science, University College London, London, UK
H
Haifeng Zhang
Institute of Automation, Chinese Academy of Science, Beijing, China
Yingzhen Li
Yingzhen Li
Imperial College London
Artificial IntelligenceMachine LearningStatistics
J
Jun Wang
AI Centre, Department of Computer Science, University College London, London, UK