Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

To address the challenges of LLM agents struggling to adapt to unfamiliar domains and relying on costly online interaction or fine-tuning, this paper proposes a lightweight, offline trajectory-based prompt augmentation method. Our approach features three key contributions: (1) trajectory distillation coupled with state-matching retrieval to extract critical decision cues from long, noisy raw trajectories; (2) context-aware feedback synthesis by jointly leveraging both successful and failed trajectories—enabling effective learning even from failure-only data; and (3) a scalable, parallelizable prompt generation framework with an adaptive scaling mechanism, supporting benchmark-agnostic prompt construction. Evaluated on MiniWoB++, WorkArena-L1, and WebArena-Lite, our method significantly outperforms strong baselines—including handcrafted prompts and documentation-based prompting—demonstrating superior efficiency, cross-domain generalization, and deployment practicality.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) agents perform well in sequential decision-making tasks, but improving them on unfamiliar domains often requires costly online interactions or fine-tuning on large expert datasets. These strategies are impractical for closed-source models and expensive for open-source ones, with risks of catastrophic forgetting. Offline trajectories offer reusable knowledge, yet demonstration-based methods struggle because raw traces are long, noisy, and tied to specific tasks. We present Just-in-time Episodic Feedback Hinter (JEF Hinter), an agentic system that distills offline traces into compact, context-aware hints. A zooming mechanism highlights decisive steps in long trajectories, capturing both strategies and pitfalls. Unlike prior methods, JEF Hinter leverages both successful and failed trajectories, extracting guidance even when only failure data is available, while supporting parallelized hint generation and benchmark-independent prompting. At inference, a retriever selects relevant hints for the current state, providing targeted guidance with transparency and traceability. Experiments on MiniWoB++, WorkArena-L1, and WebArena-Lite show that JEF Hinter consistently outperforms strong baselines, including human- and document-based hints.

Problem

Research questions and friction points this paper is trying to address.

Improving LLM agent adaptation without costly online interactions

Distilling noisy offline trajectories into compact actionable hints

Leveraging both successful and failed trajectories for guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills offline traces into compact context-aware hints

Zooming mechanism highlights decisive trajectory steps

Retriever selects relevant hints for targeted guidance

🔎 Similar Papers

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning