TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing agent-based RAG approaches improve LLM reliability via reinforcement learning but incur substantial token overhead in retrieval and reasoning, trading efficiency for accuracy. This paper proposes an efficient RAG framework addressing this trade-off. First, it introduces a retrieval compression mechanism integrating knowledge-association graphs with personalized PageRank, enabling joint semantic chunk retrieval, graph-structured triplet retrieval, and knowledge matching. Second, it proposes Iterative Process-aware Direct Preference Optimization (IP-DPO), which explicitly models and optimizes the number of reasoning steps. Evaluated on six benchmarks, our method improves accuracy by 4% (Llama3-8B) and 2% (Qwen2.5-14B) on average, while reducing output tokens by 61% and 59%, respectively—demonstrating significant gains in both accuracy and generation efficiency.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) utilizes external knowledge to augment Large Language Models'(LLMs) reliability. For flexibility, agentic RAG employs autonomous, multi-round retrieval and reasoning to resolve queries. Although recent agentic RAG has improved via reinforcement learning, they often incur substantial token overhead from search and reasoning processes. This trade-off prioritizes accuracy over efficiency. To address this issue, this work proposes TeaRAG, a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps. 1) First, the retrieved content is compressed by augmenting chunk-based semantic retrieval with a graph retrieval using concise triplets. A knowledge association graph is then built from semantic similarity and co-occurrence. Finally, Personalized PageRank is leveraged to highlight key knowledge within this graph, reducing the number of tokens per retrieval. 2) Besides, to reduce reasoning steps, Iterative Process-aware Direct Preference Optimization (IP-DPO) is proposed. Specifically, our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps. This design can produce high-quality preference-pair datasets, supporting iterative DPO to improve reasoning conciseness. Across six datasets, TeaRAG improves the average Exact Match by 4% and 2% while reducing output tokens by 61% and 59% on Llama3-8B-Instruct and Qwen2.5-14B-Instruct, respectively. Code is available at https://github.com/Applied-Machine-Learning-Lab/TeaRAG.
Problem

Research questions and friction points this paper is trying to address.

Reduces token overhead in agentic RAG systems
Compresses retrieval content using graph-based knowledge association
Minimizes reasoning steps via iterative preference optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph retrieval with triplets compresses retrieval content
IP-DPO reduces reasoning steps via knowledge matching
Personalized PageRank highlights key knowledge in graph
C
Chao Zhang
University of Science and Technology of China, China and City University of Hong Kong, Hong Kong
Y
Yuhao Wang
City University of Hong Kong, Hong Kong
Derong Xu
Derong Xu
University of Sicence and Technology of China; City University of Hong Kong
Large Language ModelsKnowledge GraphMultimodal Learning
H
Haoxin Zhang
Xiaohongshu Inc., China
Y
Yuanjie Lyu
University of Science and Technology of China, China
Y
Yuhao Chen
University of Science and Technology of China, China
Shuochen Liu
Shuochen Liu
University of Science and Technology of China
Large Language Model
T
Tong Xu
University of Science and Technology of China, China
X
Xiangyu Zhao
City University of Hong Kong, Hong Kong
Y
Yan Gao
Xiaohongshu Inc., China
Yao Hu
Yao Hu
浙江大学
Machine Learning
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning