DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

In multimodal RAG, reranker training is hindered by low-quality hard negatives—exhibiting insufficient diversity, inadequate difficulty, frequent false negatives, and poor controllability. To address this, we propose a “page-centric” paradigm for generating single-page hard negative queries: given a document page and its positive queries, we employ a collaborative LLM–VLM pipeline to synthesize semantically similar yet unanswerable queries—i.e., queries the page cannot answer. Our approach integrates query rephrasing modeling, multimodal semantic consistency constraints, and automated false-negative verification. This framework significantly enhances the hardness, diversity, and authenticity of hard negatives while enabling fine-grained, controllable generation. Evaluated on established multimodal RAG benchmarks, the reranker trained with our negatives achieves substantial improvements in retrieval accuracy and robustness over state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Rerankers play a critical role in multimodal Retrieval-Augmented Generation (RAG) by refining ranking of an initial set of retrieved documents. Rerankers are typically trained using hard negative mining, whose goal is to select pages for each query which rank high, but are actually irrelevant. However, this selection process is typically passive and restricted to what the retriever can find in the available corpus, leading to several inherent limitations. These include: limited diversity, negative examples which are often not hard enough, low controllability, and frequent false negatives which harm training. Our paper proposes an alternative approach: Single-Page Hard Negative Query Generation, which goes the other way around. Instead of retrieving negative pages per query, we generate hard negative queries per page. Using an automated LLM-VLM pipeline, and given a page and its positive query, we create hard negatives by rephrasing the query to be as similar as possible in form and context, yet not answerable from the page. This paradigm enables fine-grained control over the generated queries, resulting in diverse, hard, and targeted negatives. It also supports efficient false negative verification. Our experiments show that rerankers trained with data generated using our approach outperform existing models and significantly improve retrieval performance.

Problem

Research questions and friction points this paper is trying to address.

Generates hard negative queries per page for RAG rerankers

Improves diversity and hardness of negative training examples

Enables fine-grained control and reduces false negatives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates hard negative queries per page

Uses LLM-VLM pipeline for query rephrasing

Ensures diverse and controllable negative examples

🔎 Similar Papers

NV-Retriever: Improving text embedding models with effective hard-negative mining