Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks

📅 2025-05-01

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language model (LLM) agents for sequential decision-making often rely heavily on manual knowledge engineering—including prompt design, hand-crafted demonstrations, and task-specific action spaces—limiting generalizability and scalability. Method: This paper proposes a fully automated, human-intervention-free self-optimization framework. Its core innovations include (i) constructing and dynamically refining a self-generated database of successful trajectories; (ii) introducing a two-tier trajectory selection mechanism—population-based training coupled with experience-utility-driven filtering—at both the database and demonstration levels; and (iii) integrating dynamic retrieval-augmented in-context learning. Contribution/Results: The method eliminates task-specific design, enabling end-to-end autonomous optimization. Evaluated on ALFWorld, Wordcraft, and InterCode-SQL benchmarks, it achieves accuracy scores of 91%, 64%, and 79%, respectively—matching or surpassing performance of sophisticated manually tuned approaches—while significantly improving generalization and scalability.

Technology Category

Application Category

📝 Abstract

Many methods for improving Large Language Model (LLM) agents for sequential decision-making tasks depend on task-specific knowledge engineering--such as prompt tuning, curated in-context examples, or customized observation and action spaces. Using these approaches, agent performance improves with the quality or amount of knowledge engineering invested. Instead, we investigate how LLM agents can automatically improve their performance by learning in-context from their own successful experiences on similar tasks. Rather than relying on task-specific knowledge engineering, we focus on constructing and refining a database of self-generated examples. We demonstrate that even a naive accumulation of successful trajectories across training tasks boosts test performance on three benchmarks: ALFWorld (73% to 89%), Wordcraft (55% to 64%), and InterCode-SQL (75% to 79%)--matching the performance the initial agent achieves if allowed two to three attempts per task. We then introduce two extensions: (1) database-level selection through population-based training to identify high-performing example collections, and (2) exemplar-level selection that retains individual trajectories based on their empirical utility as in-context examples. These extensions further enhance performance, achieving 91% on ALFWorld--matching more complex approaches that employ task-specific components and prompts. Our results demonstrate that automatic trajectory database construction offers a compelling alternative to labor-intensive knowledge engineering.

Problem

Research questions and friction points this paper is trying to address.

Improving LLM agents without task-specific engineering

Automating learning from self-generated successful examples

Enhancing sequential decision-making via trajectory databases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-generated in-context examples boost performance

Population-based training selects high-performing example collections

Empirical utility retains useful individual trajectories

🔎 Similar Papers

No similar papers found.

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Authors to Follow