🤖 AI Summary
Existing traditional Chinese medicine (TCM) prescription generation models suffer from coarse-grained instruction datasets that omit critical clinical elements—such as sovereign-minister-assistant-envoy herb roles, tongue-pulse diagnostics, and contraindications—leading to incomplete and poorly interpretable outputs. To address this, we propose ZhiFangDanTai, the first framework integrating graph-based retrieval-augmented generation (GraphRAG) with domain-specific fine-tuning. It comprises two synergistic pathways: (1) constructing a fine-grained, clinically enriched instruction dataset covering TCM four diagnostic methods, pharmacological functions, and herb compatibility relationships; and (2) theoretically proving that GraphRAG effectively mitigates large language model hallucinations and generalization errors. Experiments across multiple heterogeneous datasets demonstrate consistent superiority over baselines, with significant improvements in completeness and clinical credibility of generated prescriptions—particularly regarding herb composition, therapeutic functions, contraindications, and compatibility roles. Both the model and dataset are publicly released to support real-world clinical decision support.
📝 Abstract
Traditional Chinese Medicine (TCM) formulas play a significant role in treating epidemics and complex diseases. Existing models for TCM utilize traditional algorithms or deep learning techniques to analyze formula relationships, yet lack comprehensive results, such as complete formula compositions and detailed explanations. Although recent efforts have used TCM instruction datasets to fine-tune Large Language Models (LLMs) for explainable formula generation, existing datasets lack sufficient details, such as the roles of the formula's sovereign, minister, assistant, courier; efficacy; contraindications; tongue and pulse diagnosis-limiting the depth of model outputs. To address these challenges, we propose ZhiFangDanTai, a framework combining Graph-based Retrieval-Augmented Generation (GraphRAG) with LLM fine-tuning. ZhiFangDanTai uses GraphRAG to retrieve and synthesize structured TCM knowledge into concise summaries, while also constructing an enhanced instruction dataset to improve LLMs' ability to integrate retrieved information. Furthermore, we provide novel theoretical proofs demonstrating that integrating GraphRAG with fine-tuning techniques can reduce generalization error and hallucination rates in the TCM formula task. Experimental results on both collected and clinical datasets demonstrate that ZhiFangDanTai achieves significant improvements over state-of-the-art models. Our model is open-sourced at https://huggingface.co/tczzx6/ZhiFangDanTai1.0.