TERAG: Token-Efficient Graph-Based Retrieval-Augmented Generation

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current graph-based retrieval-augmented generation (RAG) systems heavily rely on large language models (LLMs) during graph construction, incurring prohibitive token costs that severely hinder scalability. To address this, we propose an efficient token-utilization framework integrating graph-structured modeling, personalized PageRank (PPR)-driven subgraph selection, and low-token prompting strategies—significantly reducing LLM output token consumption during graph construction. Experiments demonstrate that our method uses only 3%–11% of the output tokens required by state-of-the-art approaches while maintaining over 80% factual accuracy and generation quality. Our key contributions are: (i) the first integration of PPR into the graph construction pipeline to enable semantic-aware, lightweight subgraph extraction; and (ii) structured prompt design that minimizes redundant LLM outputs. This work establishes a high-efficiency, low-cost pathway for scalable graph RAG systems.

Technology Category

Application Category

📝 Abstract
Graph-based Retrieval-augmented generation (RAG) has become a widely studied approach for improving the reasoning, accuracy, and factuality of Large Language Models. However, many existing graph-based RAG systems overlook the high cost associated with LLM token usage during graph construction, hindering large-scale adoption. To address this, we propose TERAG, a simple yet effective framework designed to build informative graphs at a significantly lower cost. Inspired by HippoRAG, we incorporate Personalized PageRank (PPR) during the retrieval phase, and we achieve at least 80% of the accuracy of widely used graph-based RAG methods while consuming only 3%-11% of the output tokens.
Problem

Research questions and friction points this paper is trying to address.

Reducing token usage costs in graph-based RAG systems
Improving scalability of graph construction for large-scale adoption
Maintaining high accuracy while minimizing LLM output tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-efficient graph construction framework
Personalized PageRank for retrieval phase
Reduces token usage by 89-97%
🔎 Similar Papers
No similar papers found.