🤖 AI Summary
To address the limitations of existing RAG systems in scientific question answering—specifically low answer faithfulness, poor citation verifiability, and inefficiency in retrieving from million-scale scholarly corpora—this paper proposes a multi-agent RAG framework tailored for academic QA. The framework orchestrates four specialized agents to jointly perform question decomposition, hybrid sparse-dense retrieval, adaptive document filtering, and inline-citation–enabled answer generation. It introduces a novel traceable citation mechanism ensuring every factual claim is directly attributable to its original source. Additionally, we construct a rigorous evaluation benchmark comprising 1,000 question-evidence-answer triplets. Evaluated on a corpus of 2.3 million papers, our method achieves up to +0.088 (12%) improvement over strong baselines in faithfulness, relevance, and context alignment, significantly enhancing both trustworthiness and scalability of scientific QA systems.
📝 Abstract
We present SQuAI (https://squai.scads.ai/), a scalable and trustworthy multi-agent retrieval-augmented generation (RAG) framework for scientific question answering (QA) with large language models (LLMs). SQuAI addresses key limitations of existing RAG systems in the scholarly domain, where complex, open-domain questions demand accurate answers, explicit claims with citations, and retrieval across millions of scientific documents. Built on over 2.3 million full-text papers from arXiv.org, SQuAI employs four collaborative agents to decompose complex questions into sub-questions, retrieve targeted evidence via hybrid sparse-dense retrieval, and adaptively filter documents to improve contextual relevance. To ensure faithfulness and traceability, SQuAI integrates in-line citations for each generated claim and provides supporting sentences from the source documents. Our system improves faithfulness, answer relevance, and contextual relevance by up to +0.088 (12%) over a strong RAG baseline. We further release a benchmark of 1,000 scientific question-answer-evidence triplets to support reproducibility. With transparent reasoning, verifiable citations, and domain-wide scalability, SQuAI demonstrates how multi-agent RAG enables more trustworthy scientific QA with LLMs.