🤖 AI Summary
This work addresses the limitations of conventional retrieval-augmented generation (RAG) approaches in handling the complex reasoning and individualized variability inherent in traditional Chinese medicine (TCM) syndrome differentiation and treatment. To bridge this gap, the authors propose an enhanced RAG framework that integrates a structured TCM knowledge graph with chain-of-thought (CoT) reasoning, achieving, for the first time, effective alignment between general TCM knowledge and personalized clinical inference. By synergistically combining knowledge graphs, CoT prompting, RAG, and large language models, the proposed method significantly outperforms native large language models, supervised fine-tuned models, and other RAG baselines across multiple TCM datasets. Notably, it substantially improves the performance of non-Chinese large language models on TCM-specific tasks, demonstrating its effectiveness in contextualizing domain-specific reasoning within a linguistically diverse setting.
📝 Abstract
Background: Retrieval augmented generation (RAG) technology can empower large language models (LLMs) to generate more accurate, professional, and timely responses without fine tuning. However, due to the complex reasoning processes and substantial individual differences involved in traditional Chinese medicine (TCM) clinical diagnosis and treatment, traditional RAG methods often exhibit poor performance in this domain. Objective: To address the limitations of conventional RAG approaches in TCM applications, this study aims to develop an improved RAG framework tailored to the characteristics of TCM reasoning. Methods: We developed TCM-DiffRAG, an innovative RAG framework that integrates knowledge graphs (KG) with chains of thought (CoT). TCM-DiffRAG was evaluated on three distinctive TCM test datasets. Results: The experimental results demonstrated that TCM-DiffRAG achieved significant performance improvements over native LLMs. For example, the qwen-plus model achieved scores of 0.927, 0.361, and 0.038, which were significantly enhanced to 0.952, 0.788, and 0.356 with TCM-DiffRAG. The improvements were even more pronounced for non-Chinese LLMs. Additionally, TCM-DiffRAG outperformed directly supervised fine-tuned (SFT) LLMs and other benchmark RAG methods. Conclusions: TCM-DiffRAG shows that integrating structured TCM knowledge graphs with Chain of Thought based reasoning substantially improves performance in individualized diagnostic tasks. The joint use of universal and personalized knowledge graphs enables effective alignment between general knowledge and clinical reasoning. These results highlight the potential of reasoning-aware RAG frameworks for advancing LLM applications in traditional Chinese medicine.