🤖 AI Summary
Current graph-based retrieval-augmented generation (RAG) systems heavily rely on large language models (LLMs) during graph construction, incurring prohibitive token costs that severely hinder scalability. To address this, we propose an efficient token-utilization framework integrating graph-structured modeling, personalized PageRank (PPR)-driven subgraph selection, and low-token prompting strategies—significantly reducing LLM output token consumption during graph construction. Experiments demonstrate that our method uses only 3%–11% of the output tokens required by state-of-the-art approaches while maintaining over 80% factual accuracy and generation quality. Our key contributions are: (i) the first integration of PPR into the graph construction pipeline to enable semantic-aware, lightweight subgraph extraction; and (ii) structured prompt design that minimizes redundant LLM outputs. This work establishes a high-efficiency, low-cost pathway for scalable graph RAG systems.
📝 Abstract
Graph-based Retrieval-augmented generation (RAG) has become a widely studied approach for improving the reasoning, accuracy, and factuality of Large Language Models. However, many existing graph-based RAG systems overlook the high cost associated with LLM token usage during graph construction, hindering large-scale adoption. To address this, we propose TERAG, a simple yet effective framework designed to build informative graphs at a significantly lower cost. Inspired by HippoRAG, we incorporate Personalized PageRank (PPR) during the retrieval phase, and we achieve at least 80% of the accuracy of widely used graph-based RAG methods while consuming only 3%-11% of the output tokens.