🤖 AI Summary
To address hallucination issues in large language models (LLMs) within high-barrier domains such as pathology, this paper proposes YpathRAG—the first retrieval-augmented generation (RAG) framework tailored for pathology. Methodologically, it constructs a comprehensive pathology vector database comprising 1.53 million paragraphs across 28 subspecialties, employs a dual-channel hybrid retrieval strategy integrating BGE-M3 dense retrieval with lexicon-guided sparse retrieval, and introduces an LLM-driven supportive evidence discrimination module to enable closed-loop optimization of retrieval, verification, and generation. Key contributions include: (1) releasing YpathR and YpathQA-M—two pathology-specific evaluation benchmarks; (2) achieving Recall@5 of 98.64%, outperforming baseline methods by 23 percentage points; and (3) improving average accuracy of general-purpose and medical LLMs by 9.0% on YpathQA-M, with maximum gains up to 15.6%, thereby substantially enhancing factual accuracy and interpretability.
📝 Abstract
Large language models (LLMs) excel on general tasks yet still hallucinate in high-barrier domains such as pathology. Prior work often relies on domain fine-tuning, which neither expands the knowledge boundary nor enforces evidence-grounded constraints. We therefore build a pathology vector database covering 28 subfields and 1.53 million paragraphs, and present YpathRAG, a pathology-oriented RAG framework with dual-channel hybrid retrieval (BGE-M3 dense retrieval coupled with vocabulary-guided sparse retrieval) and an LLM-based supportive-evidence judgment module that closes the retrieval-judgment-generation loop. We also release two evaluation benchmarks, YpathR and YpathQA-M. On YpathR, YpathRAG attains Recall@5 of 98.64%, a gain of 23 percentage points over the baseline; on YpathQA-M, a set of the 300 most challenging questions, it increases the accuracies of both general and medical LLMs by 9.0% on average and up to 15.6%. These results demonstrate improved retrieval quality and factual reliability, providing a scalable construction paradigm and interpretable evaluation for pathology-oriented RAG.