From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on Retrieval-Augmented Generation systems

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work investigates the detrimental impact of coreference complexity on retrieval-augmented generation (RAG) systems: coreferential ambiguity degrades retrieval relevance, impedes contextual understanding, and reduces generation quality. To address this, we propose a lightweight, RAG-specific coreference resolution method and systematically evaluate its synergy with various pooling strategies—particularly mean pooling—for retrieval optimization. Experiments demonstrate substantial improvements in retrieval accuracy and question-answering performance, especially for knowledge-intensive tasks, where small language models achieve an average +4.2% Exact Match gain. Our key contribution is the first empirical demonstration that coreference resolution critically enables joint retrieval-generation optimization in RAG, and we further validate its effectiveness and deployability under resource-constrained conditions.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) has emerged as a crucial framework in natural language processing (NLP), improving factual consistency and reducing hallucinations by integrating external document retrieval with large language models (LLMs). However, the effectiveness of RAG is often hindered by coreferential complexity in retrieved documents, introducing ambiguity that disrupts in-context learning. In this study, we systematically investigate how entity coreference affects both document retrieval and generative performance in RAG-based systems, focusing on retrieval relevance, contextual understanding, and overall response quality. We demonstrate that coreference resolution enhances retrieval effectiveness and improves question-answering (QA) performance. Through comparative analysis of different pooling strategies in retrieval tasks, we find that mean pooling demonstrates superior context capturing ability after applying coreference resolution. In QA tasks, we discover that smaller models benefit more from the disambiguation process, likely due to their limited inherent capacity for handling referential ambiguity. With these findings, this study aims to provide a deeper understanding of the challenges posed by coreferential complexity in RAG, providing guidance for improving retrieval and generation in knowledge-intensive AI applications.

Problem

Research questions and friction points this paper is trying to address.

Coreference complexity hinders RAG system effectiveness

Entity coreference disrupts retrieval and generative performance

Ambiguity in documents reduces QA accuracy in RAG

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coreference resolution enhances RAG retrieval effectiveness

Mean pooling excels post-coreference resolution in retrieval

Smaller models gain more from coreference disambiguation

🔎 Similar Papers

No similar papers found.