🤖 AI Summary
To address inaccurate source attribution, inconsistent multilingual performance, and insufficient factual grounding in small-scale RAG models, this paper introduces Pleias-RAG-350m/1B—a lightweight, purpose-built model. Methodologically, it employs mid-scale synthetic data training, multi-stage RAG workflow modeling, cross-lingual retrieval simulation, and literal citation generation. The model natively supports verbatim citation and factual provenance tracking, integrating query routing, rewriting, and source re-ranking modules. Its key contribution is the first demonstration of consistent RAG performance across major European languages and systematic citation grounding within the sub-1B parameter regime. Experiments show that Pleias-RAG-350m/1B significantly outperforms comparable sub-4B models on benchmarks including HotPotQA and 2WikiMultihop, matching the performance of Qwen2.5-7B while enabling efficient CPU- and edge-device deployment.
📝 Abstract
We introduce a new generation of small reasoning models for RAG, search, and source summarization. Pleias-RAG-350m and Pleias-RAG-1B are mid-trained on a large synthetic dataset emulating the retrieval of a wide variety of multilingual open sources from the Common Corpus. They provide native support for citation and grounding with literal quotes and reintegrate multiple features associated with RAG workflows, such as query routing, query reformulation, and source reranking. Pleias-RAG-350m and Pleias-RAG-1B outperform SLMs below 4 billion parameters on standardized RAG benchmarks (HotPotQA, 2wiki) and are competitive with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B, and Gemma-3-4B. They are the only SLMs to date maintaining consistent RAG performance across leading European languages and ensuring systematic reference grounding for statements. Due to their size and ease of deployment on constrained infrastructure and higher factuality by design, the models unlock a range of new use cases for generative AI.