Hit-RAG: Learning to Reason with Long Contexts via Preference Alignment

📅 2026-03-07

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work addresses the challenge in long-context retrieval-augmented generation (RAG) where high information density often drowns critical evidence in noise, leading to attention dilution and reasoning hallucinations. To this end, we propose Hit-RAG, the first three-stage progressive preference alignment framework that integrates supervised fine-tuning (SFT), discriminative preference alignment (DPA), and group relative policy optimization (GRPO) to systematically enhance the model’s ability to identify salient evidence and perform accurate logical reasoning within lengthy contexts. Experimental results demonstrate that Hit-RAG significantly outperforms existing methods across eight benchmarks, effectively mitigating attention dilution and reasoning collapse. Notably, it enables smaller models to surpass larger counterparts on complex long-context reasoning tasks, thereby bridging the gap between context acquisition and precise reasoning.

Technology Category

Application Category

📝 Abstract

Despite the promise of Retrieval-Augmented Generation in grounding Multimodal Large Language Models with external knowledge, the transition to extensive contexts often leads to significant attention dilution and reasoning hallucinations. The surge in information density causes critical evidence to be submerged by voluminous noise, which complicates the discernment of relevant fragments within a dense input. In this paper, we propose \textbf{Hit-RAG}, a multi-stage preference alignment framework designed to resolve these cognitive bottlenecks through a progressive optimization pipeline. Our approach systematically refines the utilization of external evidence via three distinct stages. First, Supervised Fine-tuning establishes baseline context awareness to minimize information neglect. Next, Discriminative Preference Alignment enhances robustness against misleading distractors. Finally, Group-Relative Policy Optimization stabilizes logical synthesis to prevent reasoning collapse. Extensive evaluations on eight benchmarks demonstrate that Hit-RAG consistently yields substantial performance gains, enabling models to bridge the gap between context acquisition and accurate reasoning while surpassing much larger counterparts in long-context scenarios.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

attention dilution

reasoning hallucinations

long-context reasoning

information noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Preference Alignment

Long-context Reasoning