CoverageBench: Evaluating Information Coverage across Tasks and Domains

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing retrieval evaluation metrics, such as precision and recall, struggle to assess the breadth of information coverage in retrieved results, particularly in retrieval-augmented generation (RAG) scenarios where it is critical to capture diverse key information. To address this limitation, this work introduces CoverageBench, the first cross-task, multi-domain benchmark specifically designed for evaluating information coverage. CoverageBench integrates topics, fine-grained information nuggets, relevance labels, and baseline rankings, moving beyond traditional document-level relevance paradigms. Released via Hugging Face Datasets, the benchmark enables reproducible, quantitative evaluation of the diversity and comprehensiveness of retrieved information, establishing a standardized platform for advancing research on information coverage in retrieval systems.

Technology Category

Application Category

📝 Abstract

We wish to measure the information coverage of an ad hoc retrieval algorithm, that is, how much of the range of available relevant information is covered by the search results. Information coverage is a central aspect for retrieval, especially when the retrieval system is integrated with generative models in a retrieval-augmented generation (RAG) system. The classic metrics for ad hoc retrieval, precision and recall, reward a system as more and more relevant documents are retrieved. However, since relevance in ad hoc test collections is defined for a document without any relation to other documents that might contain the same information, high recall is sufficient but not necessary to ensure coverage. The same is true for other metrics such as rank-biased precision (RBP), normalized discounted cumulative gain (nDCG), and mean average precision (MAP). Test collections developed around the notion of diversity ranking in web search incorporate multiple aspects that support a concept of coverage in the web domain. In this work, we construct a suite of collections for evaluating information coverage from existing collections. This suite offers researchers a unified testbed spanning multiple genres and tasks. All topics, nuggets, relevance labels, and baseline rankings are released on Hugging Face Datasets, along with instructions for accessing the publicly available document collections.

Problem

Research questions and friction points this paper is trying to address.

information coverage

ad hoc retrieval

retrieval evaluation

RAG

test collections

Innovation

Methods, ideas, or system contributions that make the work stand out.

information coverage

retrieval evaluation

diversity ranking