ARK: Answer-Centric Retriever Tuning via KG-augmented Curriculum Learning

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Retrievers in knowledge-intensive tasks struggle to identify sparse, critical evidence—especially under long-context settings—due to insufficient answer alignment capability. Method: This paper proposes an answer-alignment-driven retriever optimization framework. Its core innovation is the first use of answer sufficiency assessment as a supervisory signal, coupled with a knowledge graph (KG)-enhanced curriculum-style contrastive learning paradigm that automatically generates progressively challenging negative samples. This shifts retrieval from query similarity toward answer-generation guidance. The method integrates KG-guided query augmentation and hard negative mining without architectural modifications. Results: Evaluated on 10 long-text benchmarks across UltraDomain and LongBench, the approach achieves state-of-the-art performance, with an average improvement of 14.5%. It significantly enhances generalization and inference efficiency while maintaining robustness across diverse domains and context lengths.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful framework for knowledge-intensive tasks, yet its effectiveness in long-context scenarios is often bottlenecked by the retriever's inability to distinguish sparse yet crucial evidence. Standard retrievers, optimized for query-document similarity, frequently fail to align with the downstream goal of generating a precise answer. To bridge this gap, we propose a novel fine-tuning framework that optimizes the retriever for Answer Alignment. Specifically, we first identify high-quality positive chunks by evaluating their sufficiency to generate the correct answer. We then employ a curriculum-based contrastive learning scheme to fine-tune the retriever. This curriculum leverages LLM-constructed Knowledge Graphs (KGs) to generate augmented queries, which in turn mine progressively challenging hard negatives. This process trains the retriever to distinguish the answer-sufficient positive chunks from these nuanced distractors, enhancing its generalization. Extensive experiments on 10 datasets from the Ultradomain and LongBench benchmarks demonstrate that our fine-tuned retriever achieves state-of-the-art performance, improving 14.5% over the base model without substantial architectural modifications and maintaining strong efficiency for long-context RAG. Our work presents a robust and effective methodology for building truly answer-centric retrievers.

Problem

Research questions and friction points this paper is trying to address.

Optimizing retrievers for answer alignment in long-context RAG scenarios

Distinguishing sparse crucial evidence from challenging hard negatives

Enhancing generalization via KG-augmented curriculum contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes retriever for answer alignment using curriculum learning

Leverages knowledge graphs to generate augmented queries

Employs contrastive learning with hard negatives for generalization

🔎 Similar Papers

Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning