Hit-RAG: Learning to Reason with Long Contexts via Preference Alignment

πŸ“… 2026-03-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge in long-context retrieval-augmented generation (RAG) where high information density often drowns critical evidence in noise, leading to attention dilution and reasoning hallucinations. To this end, we propose Hit-RAG, the first three-stage progressive preference alignment framework that integrates supervised fine-tuning (SFT), discriminative preference alignment (DPA), and group relative policy optimization (GRPO) to systematically enhance the model’s ability to identify salient evidence and perform accurate logical reasoning within lengthy contexts. Experimental results demonstrate that Hit-RAG significantly outperforms existing methods across eight benchmarks, effectively mitigating attention dilution and reasoning collapse. Notably, it enables smaller models to surpass larger counterparts on complex long-context reasoning tasks, thereby bridging the gap between context acquisition and precise reasoning.

Technology Category

Application Category

πŸ“ Abstract
Despite the promise of Retrieval-Augmented Generation in grounding Multimodal Large Language Models with external knowledge, the transition to extensive contexts often leads to significant attention dilution and reasoning hallucinations. The surge in information density causes critical evidence to be submerged by voluminous noise, which complicates the discernment of relevant fragments within a dense input. In this paper, we propose \textbf{Hit-RAG}, a multi-stage preference alignment framework designed to resolve these cognitive bottlenecks through a progressive optimization pipeline. Our approach systematically refines the utilization of external evidence via three distinct stages. First, Supervised Fine-tuning establishes baseline context awareness to minimize information neglect. Next, Discriminative Preference Alignment enhances robustness against misleading distractors. Finally, Group-Relative Policy Optimization stabilizes logical synthesis to prevent reasoning collapse. Extensive evaluations on eight benchmarks demonstrate that Hit-RAG consistently yields substantial performance gains, enabling models to bridge the gap between context acquisition and accurate reasoning while surpassing much larger counterparts in long-context scenarios.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
attention dilution
reasoning hallucinations
long-context reasoning
information noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
Preference Alignment
Long-context Reasoning
Hallucination Mitigation
Policy Optimization
πŸ”Ž Similar Papers
No similar papers found.