MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in multi-turn retrieval-augmented generation (RAG) dialogues, where existing models struggle with unanswerable, underspecified, non-independent, and ambiguous queries. The study presents the first systematic definition and annotation of these four categories of difficult questions, introducing a comprehensive multi-turn RAG benchmark comprising 666 tasks spanning six domains and over 2,800 dialogue turns, accompanied by a supporting corpus enabling joint evaluation of retrieval and generation components. Experimental results demonstrate that current state-of-the-art models exhibit significant performance limitations when handling such challenging scenarios, thereby establishing a foundational benchmark and clear direction for future research in this area.

Technology Category

Application Category

📝 Abstract
We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models. We release a benchmark of 666 tasks containing over 2,800 conversation turns across 6 domains with accompanying corpora. Our experiments show that retrieval and generation models continue to struggle on conversations with UNanswerable, UNderspecified, and NONstandalone questions and UNclear responses. Our benchmark is available at https://github.com/IBM/mt-rag-benchmark
Problem

Research questions and friction points this paper is trying to address.

multi-turn RAG
unanswerable questions
underspecified questions
non-standalone questions
unclear responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Turn RAG
Unanswerable Questions
Underspecified Queries
Benchmark
Retrieval-Augmented Generation
🔎 Similar Papers
No similar papers found.