🤖 AI Summary
Rare diseases suffer from limited annotated medical imaging data, severely constraining AI-based diagnostic performance. To address this, we propose RADAR—the first retrieval-augmented reasoning agent system for rare disease diagnosis from brain MRI, explicitly modeling radiologists’ clinical literature-review workflows. RADAR requires no additional model training and enhances both zero-shot diagnostic capability and decision interpretability. It employs Sentence-BERT to embed clinical case reports and biomedical literature, leverages FAISS for efficient semantic retrieval, and implements a model-agnostic reasoning module enabling collaborative inference across multiple large language models. Evaluated on the NOVA dataset—comprising 280 rare neurological disorders—RADAR improves diagnostic accuracy by 10.2% over baseline methods. It substantially boosts the performance of open-source LLMs (e.g., DeepSeek) and provides verifiable, literature-grounded diagnostic justifications traceable to authoritative medical sources.
📝 Abstract
Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consult case reports and literature when confronted with unfamiliar findings. Following this line of reasoning, we introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic system for rare disease detection in brain MRI. Our approach uses AI agents with access to external medical knowledge by embedding both case reports and literature using sentence transformers and indexing them with FAISS to enable efficient similarity search. The agent retrieves clinically relevant evidence to guide diagnostic decision making on unseen diseases, without the need of additional training. Designed as a model-agnostic reasoning module, RADAR can be seamlessly integrated with diverse large language models, consistently improving their rare pathology recognition and interpretability. On the NOVA dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2% performance gain, with the strongest improvements observed for open source models such as DeepSeek. Beyond accuracy, the retrieved examples provide interpretable, literature grounded explanations, highlighting retrieval-augmented reasoning as a powerful paradigm for low-prevalence conditions in medical imaging.