🤖 AI Summary
To address hallucination, shallow reasoning, and poor clinical interpretability in large language models (LLMs) for medical question answering, this paper proposes the first causality-aware retrieval-augmented generation (RAG) framework. Our method integrates causal reasoning into medical RAG: (1) causal-aware document retrieval, aligned with diagnostic logic, to enhance evidence relevance; and (2) domain-customized structured chain-of-thought prompting to enable stepwise, traceable clinical reasoning. Unlike conventional semantic-matching RAG, our framework explicitly models causal relationships among symptoms, diseases, and treatments, thereby strengthening reasoning depth and interpretability. Evaluated on three medical QA benchmarks, our approach achieves 10.3% and 6.4% absolute accuracy gains over baseline RAG and state-of-the-art domain-adapted methods, respectively, while significantly improving reasoning consistency.
📝 Abstract
Large language models (LLMs) have shown promise in medical question answering but often struggle with hallucinations and shallow reasoning, particularly in tasks requiring nuanced clinical understanding. Retrieval-augmented generation (RAG) offers a practical and privacy-preserving way to enhance LLMs with external medical knowledge. However, most existing approaches rely on surface-level semantic retrieval and lack the structured reasoning needed for clinical decision support. We introduce MedCoT-RAG, a domain-specific framework that combines causal-aware document retrieval with structured chain-of-thought prompting tailored to medical workflows. This design enables models to retrieve evidence aligned with diagnostic logic and generate step-by-step causal reasoning reflective of real-world clinical practice. Experiments on three diverse medical QA benchmarks show that MedCoT-RAG outperforms strong baselines by up to 10.3% over vanilla RAG and 6.4% over advanced domain-adapted methods, improving accuracy, interpretability, and consistency in complex medical tasks.