Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

To address inaccurate source attribution, inconsistent multilingual performance, and insufficient factual grounding in small-scale RAG models, this paper introduces Pleias-RAG-350m/1B—a lightweight, purpose-built model. Methodologically, it employs mid-scale synthetic data training, multi-stage RAG workflow modeling, cross-lingual retrieval simulation, and literal citation generation. The model natively supports verbatim citation and factual provenance tracking, integrating query routing, rewriting, and source re-ranking modules. Its key contribution is the first demonstration of consistent RAG performance across major European languages and systematic citation grounding within the sub-1B parameter regime. Experiments show that Pleias-RAG-350m/1B significantly outperforms comparable sub-4B models on benchmarks including HotPotQA and 2WikiMultihop, matching the performance of Qwen2.5-7B while enabling efficient CPU- and edge-device deployment.

Technology Category

Application Category

📝 Abstract

We introduce a new generation of small reasoning models for RAG, search, and source summarization. Pleias-RAG-350m and Pleias-RAG-1B are mid-trained on a large synthetic dataset emulating the retrieval of a wide variety of multilingual open sources from the Common Corpus. They provide native support for citation and grounding with literal quotes and reintegrate multiple features associated with RAG workflows, such as query routing, query reformulation, and source reranking. Pleias-RAG-350m and Pleias-RAG-1B outperform SLMs below 4 billion parameters on standardized RAG benchmarks (HotPotQA, 2wiki) and are competitive with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B, and Gemma-3-4B. They are the only SLMs to date maintaining consistent RAG performance across leading European languages and ensuring systematic reference grounding for statements. Due to their size and ease of deployment on constrained infrastructure and higher factuality by design, the models unlock a range of new use cases for generative AI.

Problem

Research questions and friction points this paper is trying to address.

Develop small reasoning models for RAG and source summarization

Enhance citation support with literal quotes in multilingual contexts

Improve RAG performance for models under 4 billion parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mid-trained on synthetic multilingual dataset

Native citation support with literal quotes

Outperform SLMs below 4B parameters

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting