Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective

📅 2026-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing soft compression methods in retrieval-augmented generation (RAG) perform full-context compression without considering query relevance, leading to performance degradation and high computational overhead. This work proposes SeleCom, a novel framework that introduces, for the first time, a query-conditioned selection mechanism. It reformulates the compression encoder into a decoder-only context selector and trains it via curriculum learning on large-scale, difficulty-tiered synthetic question-answering data. This approach effectively mitigates the behavioral conflict between compression and generation as well as the issue of information dilution. Experimental results demonstrate that SeleCom significantly outperforms existing soft compression methods, matching or exceeding the performance of non-compressed RAG baselines across multiple tasks while reducing computational cost and latency by 33.8%–84.6%.
📝 Abstract
Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft context compression aims to address this by encoding long documents into compact embeddings, yet they often underperform non-compressed RAG due to their reliance on auto-encoder-like full-compression that forces the encoder to compress all document information regardless of relevance to the input query. In this work, we conduct an analysis on this paradigm and reveal two fundamental limitations: (I) Infeasibility, full-compression conflicts with the LLM's downstream generation behavior; and (II) Non-necessity: full-compression is unnecessary and dilutes task-relevant information density. Motivated by these insights, we introduce SeleCom, a selector-based soft compression framework for RAG that redefines the encoder's role as query-conditioned information selector. The selector is decoder-only and is trained with a massive, diverse and difficulty-graded synthetic QA dataset with curriculum learning. Extensive experiments show that SeleCom significantly outperforms existing soft compression approaches and achieves competitive or superior performance to non-compression baselines, while reducing computation and latency by 33.8%~84.6%.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
soft compression
query-conditioned selection
context redundancy
information density
Innovation

Methods, ideas, or system contributions that make the work stand out.

selector-based compression
query-conditioned selection
soft context compression
retrieval-augmented generation
curriculum learning
🔎 Similar Papers
No similar papers found.
Yunhao Liu
Yunhao Liu
ACM Fellow, IEEE Fellow, CCF Fellow, Tsinghua University
Wireless Sensor Networks/RFIDCyber Physical Systems and IoTPrivacy and SecurityCloud Computing
Z
Zian Jia
Shanghai Key Laboratory of Data Science, College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Xinyu Gao
Xinyu Gao
NanJing University
Autonomous DrivingMulti-sensor FusionTesting
K
Kanjun Xu
Shanghai Key Laboratory of Data Science, College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Y
Yun Xiong
Shanghai Key Laboratory of Data Science, College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China