CoRAG: Collaborative Retrieval-Augmented Generation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

In low-resource open-domain question answering, knowledge-intensive tasks face challenges in effectively leveraging external knowledge while mitigating risks from irrelevant or harmful retrieved passages. Method: This paper proposes Collaborative RAG—a novel paradigm that balances rich knowledge utilization from a shared knowledge base against retrieval noise. It introduces a client-side collaborative training framework featuring a shared vector index and cooperative passage storage, enabling joint model optimization and collective knowledge curation. Crucially, it is the first to formally model the trade-off among relevant passages, irrelevant passages, and hard negatives. Contribution/Results: Evaluated on the newly constructed CRAB benchmark, Collaborative RAG significantly outperforms both parameterized collaborative learning and local RAG baselines. It demonstrates superior effectiveness and robustness in few-shot settings, empirically validating the critical role of collaborative knowledge bases in low-resource RAG.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) models excel in knowledge-intensive tasks, especially under few-shot learning constraints. We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store. To evaluate CoRAG, we introduce CRAB, a benchmark for collaborative homogeneous open-domain question answering. Our experiments demonstrate that CoRAG consistently outperforms both parametric collaborative learning methods and locally trained RAG models in low-resource scenarios. Further analysis reveals the critical importance of relevant passages within the shared store, the surprising benefits of incorporating irrelevant passages, and the potential for hard negatives to negatively impact performance. This introduces a novel consideration in collaborative RAG: the trade-off between leveraging a collectively enriched knowledge base and the potential risk of incorporating detrimental passages from other clients. Our findings underscore the viability of CoRAG, while also highlighting key design challenges and promising avenues for future research.

Problem

Research questions and friction points this paper is trying to address.

Extends RAG to collaborative model training among clients

Evaluates performance using CRAB benchmark for open-domain QA

Explores trade-offs in shared knowledge base collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends RAG to collaborative client training

Uses shared collaborative passage store

Balances enriched knowledge and detrimental passages

🔎 Similar Papers

No similar papers found.