π€ AI Summary
To address the weak cross-domain generalization and lack of diverse evaluation benchmarks in multi-domain Retrieval-Augmented Generation (RAG), this paper introduces MultiRAG-Benchβthe first comprehensive, 13-domain RAG benchmark. We further propose Seq-KD, a sequence-level knowledge distillation method leveraging high-quality teacher-generated answer sequences to supervise student model training, thereby enhancing zero-shot generalization to unseen domains. Experiments demonstrate that Seq-KD consistently outperforms standard fine-tuning in cross-domain question answering accuracy, yielding an average improvement of +4.2%. Gains are especially pronounced in low-resource domains. This work establishes a new benchmark for evaluating RAG model generality and introduces a robust training paradigm grounded in teacher-guided sequence distillation.
π Abstract
Retrieval-Augmented Generation (RAG) enhances LLM factuality, but multi-domain applications face challenges like lack of diverse benchmarks and poor out-of-domain generalization. The first contribution of this work is to introduce a diverse benchmark comprising a variety of question-answering tasks from 8 sources and covering 13 domains. Our second contribution consists in systematically testing out-of-domain generalization for typical RAG tuning strategies. While our findings reveal that standard fine-tuning fails to generalize effectively, we show that sequence-level distillation with teacher-generated labels improves out-of-domain performance by providing more coherent supervision. Our findings highlight key strategies for improving multi-domain RAG robustness.