🤖 AI Summary
Existing RAG frameworks suffer from poor reproducibility, delayed integration of emerging techniques, and high system overhead. To address these challenges, we propose an open-source RAG framework tailored for research and rapid prototyping, featuring the first unified architecture supporting three retrieval-augmentation paradigms: textual, multimodal, and web-based. The framework employs asynchronous I/O and a modular design, integrating vector, keyword, graph, and web retrievers; it further supports LLM adaptation, dynamic routing, and cache-aware execution, alongside full lifecycle management, asynchronous processing, and persistent caching. Extensive evaluation across multiple benchmark tasks demonstrates low latency, high throughput, and strong cross-modal generalization—significantly improving development efficiency and reproducibility of RAG systems. The implementation is publicly available.
📝 Abstract
Retrieval-Augmented Generation (RAG) plays a pivotal role in modern large language model applications, with numerous existing frameworks offering a wide range of functionalities to facilitate the development of RAG systems. However, we have identified several persistent challenges in these frameworks, including difficulties in algorithm reproduction and sharing, lack of new techniques, and high system overhead. To address these limitations, we introduce extbf{FlexRAG}, an open-source framework specifically designed for research and prototyping. FlexRAG supports text-based, multimodal, and network-based RAG, providing comprehensive lifecycle support alongside efficient asynchronous processing and persistent caching capabilities. By offering a robust and flexible solution, FlexRAG enables researchers to rapidly develop, deploy, and share advanced RAG systems. Our toolkit and resources are available at href{https://github.com/ictnlp/FlexRAG}{https://github.com/ictnlp/FlexRAG}.