🤖 AI Summary
This work addresses a key limitation in retrieval-augmented generation (RAG), where large language models often favor fluent yet hallucinatory outputs over factually accurate but structurally loose retrieved documents, thereby underutilizing relevant information. To mitigate this, the authors propose QREAM, a framework that employs a question-guided controllable rewriting mechanism to restructure retrieved passages into forms that align with the query’s style while preserving factual content. QREAM operates in two stages: first, it uses iterative in-context learning (QREAM-ICL) to explore style-aligned rewrites; second, it trains a lightweight student model (QREAM-FT) via distillation and rejection sampling guided by dual criteria—answer correctness and factual consistency. Experiments demonstrate that QREAM achieves up to an 8% relative performance gain across multiple state-of-the-art RAG systems with negligible inference overhead, effectively balancing question relevance and factual accuracy.
📝 Abstract
Retrieval-Augmented Generation (RAG) enhances the factuality of Large Language Models (LLMs) by incorporating retrieved documents and/or generated context. However, LLMs often exhibit a stylistic bias when presented with mixed contexts, favoring fluent but hallucinated generated content over factually grounded yet disorganized retrieved evidence. This phenomenon reveals that the utility of retrieved information is bottlenecked by its presentation. To bridge this gap, we propose QREAM, a style-controlled rewriter that aligns retrieved documents with a question-oriented style while preserving facts, better for LLM readers to utilize. Our framework consists of two stages: (1) QREAM-ICL, which uses stylistic seeds to guide iterative rewriting exploration; and (2) QREAM-FT, a lightweight student model distilled from denoised ICL outputs. QREAM-FT employs dual-criteria rejection sampling, filtering based on answer correctness and factual consistency to ensure high-quality supervision. QREAM seamlessly integrates into existing RAG pipelines as a plug-and-play module. Experiments demonstrate that QREAM consistently enhances advanced RAG pipelines, yielding up to 8% relative improvement with negligible latency overhead, effectively balancing question relevance with factual grounding.