🤖 AI Summary
Unstructured clinical text introduces substantial data noise, terminological inconsistency, and logical fragmentation, hindering robust AI deployment in healthcare. To address these challenges, we propose a knowledge graph construction framework integrating SNOMED CT standardized terminology with the Neo4j graph database. Leveraging NLP-driven entity-relation extraction, our method structurally represents clinical concepts—including diseases, symptoms, and medications—and their semantic relationships, enabling multi-hop reasoning and terminological normalization. We further generate a high-quality JSON training dataset from the graph and employ it to fine-tune large language models (LLMs) for diagnostic reasoning. This work constitutes the first implementation of computationally executable SNOMED CT relationship modeling within a graph database, establishing a closed-loop for multi-hop clinical inference. Experimental results demonstrate significant improvements in logical accuracy and interpretability of generated diagnostic pathways, offering a scalable, trustworthy paradigm for AI-assisted clinical decision support systems.
📝 Abstract
The effectiveness of artificial intelligence (AI) in healthcare is significantly hindered by unstructured clinical documentation, which results in noisy, inconsistent, and logically fragmented training data. To address this challenge, we present a knowledge-driven framework that integrates the standardized clinical terminology SNOMED CT with the Neo4j graph database to construct a structured medical knowledge graph. In this graph, clinical entities such as diseases, symptoms, and medications are represented as nodes, and semantic relationships such as ``caused by,'' ``treats,'' and ``belongs to'' are modeled as edges in Neo4j, with types mapped from formal SNOMED CT relationship concepts (e.g., exttt{Causative agent}, exttt{Indicated for}). This design enables multi-hop reasoning and ensures terminological consistency. By extracting and standardizing entity-relationship pairs from clinical texts, we generate structured, JSON-formatted datasets that embed explicit diagnostic pathways. These datasets are used to fine-tune large language models (LLMs), significantly improving the clinical logic consistency of their outputs. Experimental results demonstrate that our knowledge-guided approach enhances the validity and interpretability of AI-generated diagnostic reasoning, providing a scalable solution for building reliable AI-assisted clinical systems.