Fine-tuning with RAG for Improving LLM Learning of New Skills

πŸ“… 2025-10-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
LLM agents frequently fail in multi-step tasks due to unmet preconditions, redundant commands, or misjudged environmental constraints. To address this, we propose a lightweight, runtime-retrieval-free knowledge internalization method that distills the retrieval-augmented reasoning capability of RAG into the student model’s intrinsic reasoning capacity via knowledge distillation. Specifically, we automatically extract compact prompts from failure trajectories to construct high-quality teacher trajectories, and integrate one-shot retrieval with prompt-string removal during training to enable scalable knowledge transfer across model sizes and architectures. Evaluated on ALFWorld and WebShop benchmarks, our approach achieves a success rate of 91% (+12 percentage points) and a score of 72 (+11), respectively, while reducing inference token consumption by 10–60%. This significantly alleviates RAG’s dependency on external knowledge bases and computational overhead.

Technology Category

Application Category

πŸ“ Abstract
Large language model (LLM) agents deployed for multi-step tasks frequently fail in predictable ways: attempting actions with unmet preconditions, issuing redundant commands, or mishandling environment constraints. While retrieval-augmented generation (RAG) can improve performance by providing runtime guidance, it requires maintaining external knowledge databases and adds computational overhead at every deployment. We propose a simple pipeline that converts inference-time retrieval into learned competence through distillation. Our approach: (1) extracts compact, reusable hints from agent failures, (2) uses these hints to generate improved teacher trajectories via one-shot retrieval at episode start, and (3) trains student models on these trajectories with hint strings removed, forcing internalization rather than memorization. Across two interactive benchmarks, ALFWorld (household tasks) and WebShop (online shopping), distilled students consistently outperform baseline agents, achieving up to 91% success on ALFWorld (vs. 79% for baselines) and improving WebShop scores to 72 (vs. 61 for baselines), while using 10-60% fewer tokens than retrieval-augmented teachers depending on the environment. The approach generalizes across model scales (7B/14B parameters) and agent architectures (ReAct/StateAct), demonstrating that retrieval benefits can be effectively internalized through targeted fine-tuning without permanent runtime dependencies.
Problem

Research questions and friction points this paper is trying to address.

Reduces LLM failures in multi-step tasks through fine-tuning
Converts retrieval-augmented generation into internalized model competence
Eliminates runtime retrieval dependencies while improving task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts retrieval into learned competence via distillation
Extracts reusable hints from agent failures for training
Trains models on improved trajectories without hint dependencies
πŸ”Ž Similar Papers
No similar papers found.
H
Humaid Ibrahim
Department of Computing, Imperial College London
N
Nikolai Rozanov
Department of Computing, Imperial College London
Marek Rei
Marek Rei
Associate Professor, Imperial College London
Artificial IntelligenceLanguage ModelingMachine LearningNatural Language Processing