Online-Optimized RAG for Tool Use and Function Calling

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the query-tool embedding misalignment problem in retrieval-augmented generation (RAG), caused by inherent limitations of embedding models or noisy tool descriptions. We propose a lightweight, online optimization framework that operates during deployment without modifying the underlying large language model. Leveraging user interaction feedback, it performs single- or multi-step gradient updates to dynamically align embeddings, supporting evolving tool libraries, K-nearest neighbor retrieval, and re-ranking. Our key contribution is the first fine-tuning-free, real-time adaptive embedding alignment mechanism, accompanied by theoretical analysis of how initialization quality affects convergence. Experiments across diverse tool-calling and document retrieval benchmarks demonstrate significant improvements in tool selection accuracy and end-to-end task success rates, while exhibiting strong robustness and practical applicability.

Technology Category

Application Category

📝 Abstract

In many applications, retrieval-augmented generation (RAG) drives tool use and function calling by embedding the (user) queries and matching them to pre-specified tool/function descriptions. In this paper, we address an embedding misalignment issue that often arises in practical applications due to imperfect embedding models or noisy descriptions; such misalignment may lead to incorrect retrieval and task failure. We introduce Online-Optimized RAG, a deployment-time framework that continually adapts retrieval embeddings from live interactions using minimal feedback (e.g., task success). Online-Optimized RAG applies lightweight online gradient updates with negligible per-query latency and requires no changes to the underlying LLM. The method is plug-and-play: it supports both single- and multi-hop tool use, dynamic tool inventories, and $K$-retrieval with re-ranking. We provide a problem-dependent theoretical analysis that quantifies how the method's performance depends on the initialization quality of the embeddings and other related quantities. Across diverse tool-use and document-retrieval scenarios, our Online-Optimized RAG consistently improves tool selection accuracy and end-task success, thus providing a simple, practical path to robust, self-improving RAG systems.

Problem

Research questions and friction points this paper is trying to address.

Addresses embedding misalignment in RAG systems

Improves tool selection accuracy using online optimization

Enables robust tool use with minimal feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online-optimized RAG adapts retrieval embeddings from live interactions

Applies lightweight online gradient updates with negligible latency

Plug-and-play method supports dynamic tool inventories and re-ranking

🔎 Similar Papers

FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research