🤖 AI Summary
Retrieval-Augmented Generation (RAG) suffers from noisy, highly redundant retrieved content and insufficient exploitation of fine-grained inter-document semantic relationships.
Method: This paper proposes a dynamic clustering–driven document compression framework. It introduces a novel, efficient dynamic clustering mechanism that adaptively models implicit semantic associations among documents during compression. The framework jointly integrates dynamic similarity modeling and hierarchical content distillation to balance information fidelity and inference efficiency.
Contribution/Results: Implemented atop GPT-3.5, the framework significantly reduces redundancy rates and error propagation. It improves robustness and generalization across multi-task benchmarks—including knowledge-intensive question answering and hallucination detection—demonstrating strong cross-scenario applicability.
📝 Abstract
Retrieval-Augmented Generation (RAG) has emerged as a widely adopted approach for knowledge integration during large language model (LLM) inference in recent years. However, current RAG implementations face challenges in effectively addressing noise, repetition and redundancy in retrieved content, primarily due to their limited ability to exploit fine-grained inter-document relationships. To address these limitations, we propose an extbf{E}fficient extbf{D}ynamic extbf{C}lustering-based document extbf{C}ompression framework ( extbf{EDC extsuperscript{2}-RAG}) that effectively utilizes latent inter-document relationships while simultaneously removing irrelevant information and redundant content. We validate our approach, built upon GPT-3.5, on widely used knowledge-QA and hallucination-detected datasets. The results show that this method achieves consistent performance improvements across various scenarios and experimental settings, demonstrating strong robustness and applicability. Our code and datasets can be found at https://github.com/Tsinghua-dhy/EDC-2-RAG.