🤖 AI Summary
In Retrieval-Augmented Generation (RAG), retrieved evidence often contains redundancy, irrelevant content, or information incompatible with the target large language model’s internal knowledge, degrading generation quality. To address this, we propose a training-free, familiarity-aware evidence compression method that leverages attention mechanisms and implicit knowledge alignment to lightweightly compress external evidence into compact representations highly compatible with the model’s internal representations—enabling seamless, parameter-free synergy between external evidence and internal parametric knowledge. Our approach requires no fine-tuning or auxiliary training. Evaluated on multiple open-domain question answering benchmarks, it achieves up to a 28.1% absolute accuracy improvement over strong baselines while maintaining high compression ratios—significantly outperforming existing evidence compression methods.
📝 Abstract
Retrieval-augmented generation (RAG) improves large language models (LMs) by incorporating non-parametric knowledge through evidence retrieved from external sources. However, it often struggles to cope with inconsistent and irrelevant information that can distract the LM from its tasks, especially when multiple evidence pieces are required. While compressing the retrieved evidence with a compression model aims to address this issue, the compressed evidence may still be unfamiliar to the target model used for downstream tasks, potentially failing to utilize the evidence effectively. We propose FaviComp (Familarity-Aware Evidence Compression), a novel training-free evidence compression technique that makes retrieved evidence more familiar to the target model, while seamlessly integrating parametric knowledge from the model. Experimental results show that FaviComp consistently outperforms most recent evidence compression baselines across multiple open-domain QA datasets, improving accuracy by up to 28.1% while achieving high compression rates. Additionally, we demonstrate the effective integration of both parametric and non-parametric knowledge during evidence compression.