ZhiFangDanTai: Fine-tuning Graph-based Retrieval-Augmented Generation Model for Traditional Chinese Medicine Formula

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing traditional Chinese medicine (TCM) prescription generation models suffer from coarse-grained instruction datasets that omit critical clinical elements—such as sovereign-minister-assistant-envoy herb roles, tongue-pulse diagnostics, and contraindications—leading to incomplete and poorly interpretable outputs. To address this, we propose ZhiFangDanTai, the first framework integrating graph-based retrieval-augmented generation (GraphRAG) with domain-specific fine-tuning. It comprises two synergistic pathways: (1) constructing a fine-grained, clinically enriched instruction dataset covering TCM four diagnostic methods, pharmacological functions, and herb compatibility relationships; and (2) theoretically proving that GraphRAG effectively mitigates large language model hallucinations and generalization errors. Experiments across multiple heterogeneous datasets demonstrate consistent superiority over baselines, with significant improvements in completeness and clinical credibility of generated prescriptions—particularly regarding herb composition, therapeutic functions, contraindications, and compatibility roles. Both the model and dataset are publicly released to support real-world clinical decision support.

Technology Category

Application Category

📝 Abstract
Traditional Chinese Medicine (TCM) formulas play a significant role in treating epidemics and complex diseases. Existing models for TCM utilize traditional algorithms or deep learning techniques to analyze formula relationships, yet lack comprehensive results, such as complete formula compositions and detailed explanations. Although recent efforts have used TCM instruction datasets to fine-tune Large Language Models (LLMs) for explainable formula generation, existing datasets lack sufficient details, such as the roles of the formula's sovereign, minister, assistant, courier; efficacy; contraindications; tongue and pulse diagnosis-limiting the depth of model outputs. To address these challenges, we propose ZhiFangDanTai, a framework combining Graph-based Retrieval-Augmented Generation (GraphRAG) with LLM fine-tuning. ZhiFangDanTai uses GraphRAG to retrieve and synthesize structured TCM knowledge into concise summaries, while also constructing an enhanced instruction dataset to improve LLMs' ability to integrate retrieved information. Furthermore, we provide novel theoretical proofs demonstrating that integrating GraphRAG with fine-tuning techniques can reduce generalization error and hallucination rates in the TCM formula task. Experimental results on both collected and clinical datasets demonstrate that ZhiFangDanTai achieves significant improvements over state-of-the-art models. Our model is open-sourced at https://huggingface.co/tczzx6/ZhiFangDanTai1.0.
Problem

Research questions and friction points this paper is trying to address.

Insufficient details in TCM datasets limit model output depth
Existing models lack comprehensive formula compositions and explanations
Need to reduce generalization error and hallucination in TCM tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining GraphRAG with fine-tuning for TCM knowledge retrieval
Constructing enhanced instruction dataset for detailed formula information
Providing theoretical proofs for reduced error and hallucination rates
🔎 Similar Papers
No similar papers found.