Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate source attribution, inconsistent multilingual performance, and insufficient factual grounding in small-scale RAG models, this paper introduces Pleias-RAG-350m/1B—a lightweight, purpose-built model. Methodologically, it employs mid-scale synthetic data training, multi-stage RAG workflow modeling, cross-lingual retrieval simulation, and literal citation generation. The model natively supports verbatim citation and factual provenance tracking, integrating query routing, rewriting, and source re-ranking modules. Its key contribution is the first demonstration of consistent RAG performance across major European languages and systematic citation grounding within the sub-1B parameter regime. Experiments show that Pleias-RAG-350m/1B significantly outperforms comparable sub-4B models on benchmarks including HotPotQA and 2WikiMultihop, matching the performance of Qwen2.5-7B while enabling efficient CPU- and edge-device deployment.

Technology Category

Application Category

📝 Abstract
We introduce a new generation of small reasoning models for RAG, search, and source summarization. Pleias-RAG-350m and Pleias-RAG-1B are mid-trained on a large synthetic dataset emulating the retrieval of a wide variety of multilingual open sources from the Common Corpus. They provide native support for citation and grounding with literal quotes and reintegrate multiple features associated with RAG workflows, such as query routing, query reformulation, and source reranking. Pleias-RAG-350m and Pleias-RAG-1B outperform SLMs below 4 billion parameters on standardized RAG benchmarks (HotPotQA, 2wiki) and are competitive with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B, and Gemma-3-4B. They are the only SLMs to date maintaining consistent RAG performance across leading European languages and ensuring systematic reference grounding for statements. Due to their size and ease of deployment on constrained infrastructure and higher factuality by design, the models unlock a range of new use cases for generative AI.
Problem

Research questions and friction points this paper is trying to address.

Develop small reasoning models for RAG and source summarization
Enhance citation support with literal quotes in multilingual contexts
Improve RAG performance for models under 4 billion parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mid-trained on synthetic multilingual dataset
Native citation support with literal quotes
Outperform SLMs below 4B parameters
🔎 Similar Papers
No similar papers found.
P
Pierre-Carl Langlais
PleIAs, Paris, France
Pavel Chizhov
Pavel Chizhov
Researcher and Ph.D. Student at cairo.thws
natural language processinglarge language modelscomputer visionself-supervised learning
M
Mattia Nee
PleIAs, Paris, France
C
Carlos Rosas Hinostroza
PleIAs, Paris, France
M
Matthieu Delsart
PleIAs, Paris, France
I
Irene Girard
PleIAs, Paris, France
O
Othman Hicheur
PleIAs, Paris, France
A
Anastasia Stasenko
PleIAs, Paris, France
Ivan P. Yamshchikov
Ivan P. Yamshchikov
Research Professor at CAIRO, THWS
natural language generationcomputational creativityempathetic aiethics of ai application