Open-World Evaluation for Retrieving Diverse Perspectives

📅 2024-09-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the diversity-aware retrieval task for complex, controversial questions (e.g., “Does ChatGPT do more harm than good?”), proposing a perspective-driven evaluation paradigm. We formally define the open-world document retrieval problem requiring coverage of multiple subjective viewpoints and introduce BERDS—the first benchmark for subjective-question diversity, comprising survey and debate data. To overcome limitations of string-matching metrics, we design an LLM-based automatic perspective identification evaluator. We systematically assess dense retrievers (BERT, ColBERT), query expansion, and diversity-oriented re-ranking across three corpora. Results show that state-of-the-art methods cover all ground-truth perspectives in only 33.74% of cases and exhibit significant viewpoint bias. In contrast, dynamic corpus construction and perspective-aware re-ranking substantially improve viewpoint coverage. This study establishes a new benchmark, evaluation framework, and empirical insights for controversial information retrieval.

Technology Category

Application Category

📝 Abstract

We study retrieving a set of documents that covers various perspectives on a complex and contentious question (e.g., will ChatGPT do more harm than good?). We curate a Benchmark for Retrieval Diversity for Subjective questions (BERDS), where each example consists of a question and diverse perspectives associated with the question, sourced from survey questions and debate websites. On this data, retrievers paired with a corpus are evaluated to surface a document set that contains diverse perspectives. Our framing diverges from most retrieval tasks in that document relevancy cannot be decided by simple string matches to references. Instead, we build a language model based automatic evaluator that decides whether each retrieved document contains a perspective. This allows us to evaluate the performance of three different types of corpus (Wikipedia, web snapshot, and corpus constructed on the fly with retrieved pages from the search engine) paired with retrievers. Retrieving diverse documents remains challenging, with the outputs from existing retrievers covering all perspectives on only 33.74% of the examples. We further study the impact of query expansion and diversity-focused reranking approaches and analyze retriever sycophancy. Together, we lay the foundation for future studies in retrieval diversity handling complex queries.

Problem

Research questions and friction points this paper is trying to address.

Retrieve diverse perspectives on subjective questions

Evaluate document relevancy beyond simple string matches

Assess effectiveness of query expansion and reranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for Retrieval Diversity (BERDS) creation

Language model-based automatic perspective evaluator

Query expansion and diversity-focused reranking

🔎 Similar Papers

No similar papers found.