π€ AI Summary
This work addresses the challenge of inefficient in-context learning (ICL) utilization by language model agents in sequential decision-making tasks. To this end, we propose a systematic ICL framework comprising three core components: (1) an LLM-based automatic trajectory annotation algorithm that leverages iterative refinement to alleviate the bottleneck of manual long-trajectory labeling; (2) a similarity-aware, dynamic demonstration retrieval mechanism that enhances demonstration relevance and robustness across diverse tasks; and (3) a novel stepwise trajectory chunking prompting strategy that preserves critical decision logic while substantially reducing inference overhead. Experiments demonstrate that small language models equipped with our ICL framework match or exceed the performance of costly fine-tuned large-model agentsβand achieve further gains when demonstrations are generated by large models. Overall, the approach significantly improves reliability, inference efficiency, and cross-task generalization capability.
π Abstract
In-context learning (ICL) with dynamically selected demonstrations combines the flexibility of prompting large language models (LLMs) with the ability to leverage training data to improve performance. While ICL has been highly successful for prediction and generation tasks, leveraging it for agentic tasks that require sequential decision making is challenging -- one must think not only about how to annotate long trajectories at scale and how to select demonstrations, but also what constitutes demonstrations, and when and where to show them. To address this, we first propose an algorithm that leverages an LLM with retries along with demonstrations to automatically and efficiently annotate agentic tasks with solution trajectories. We then show that set-selection of trajectories of similar tasks as demonstrations significantly improves performance, reliability, robustness, and efficiency of LLM agents. However, trajectory demonstrations have a large inference cost overhead. We show that this can be mitigated by using small trajectory snippets at every step instead of an additional trajectory. We find that demonstrations obtained from larger models (in the annotation phase) also improve smaller models, and that ICL agents can even rival costlier trained agents. Thus, our results reveal that ICL, with careful use, can be very powerful for agentic tasks as well.