π€ AI Summary
In low-resource open-domain question answering, knowledge-intensive tasks face challenges in effectively leveraging external knowledge while mitigating risks from irrelevant or harmful retrieved passages. Method: This paper proposes Collaborative RAGβa novel paradigm that balances rich knowledge utilization from a shared knowledge base against retrieval noise. It introduces a client-side collaborative training framework featuring a shared vector index and cooperative passage storage, enabling joint model optimization and collective knowledge curation. Crucially, it is the first to formally model the trade-off among relevant passages, irrelevant passages, and hard negatives. Contribution/Results: Evaluated on the newly constructed CRAB benchmark, Collaborative RAG significantly outperforms both parameterized collaborative learning and local RAG baselines. It demonstrates superior effectiveness and robustness in few-shot settings, empirically validating the critical role of collaborative knowledge bases in low-resource RAG.
π Abstract
Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive tasks, especially under few-shot learning constraints. We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. To evaluate CoRAG, we introduce CRAB, a benchmark for collaborative homogeneous open-domain question answering. Our experiments demonstrate that CoRAG consistently outperforms both parametric collaborative learning methods and locally trained RAG models in low-resource scenarios. Further analysis reveals the critical importance of relevant passages within the shared store, the surprising benefits of incorporating irrelevant passages, and the potential for hard negatives to negatively impact performance. This introduces a novel consideration in collaborative RAG: the trade-off between leveraging a collectively enriched knowledge base and the potential risk of incorporating detrimental passages from other clients. Our findings underscore the viability of CoRAG, while also highlighting key design challenges and promising avenues for future research.