SNOMED CT-powered Knowledge Graphs for Structured Clinical Data and Diagnostic Reasoning

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unstructured clinical text introduces substantial data noise, terminological inconsistency, and logical fragmentation, hindering robust AI deployment in healthcare. To address these challenges, we propose a knowledge graph construction framework integrating SNOMED CT standardized terminology with the Neo4j graph database. Leveraging NLP-driven entity-relation extraction, our method structurally represents clinical concepts—including diseases, symptoms, and medications—and their semantic relationships, enabling multi-hop reasoning and terminological normalization. We further generate a high-quality JSON training dataset from the graph and employ it to fine-tune large language models (LLMs) for diagnostic reasoning. This work constitutes the first implementation of computationally executable SNOMED CT relationship modeling within a graph database, establishing a closed-loop for multi-hop clinical inference. Experimental results demonstrate significant improvements in logical accuracy and interpretability of generated diagnostic pathways, offering a scalable, trustworthy paradigm for AI-assisted clinical decision support systems.

Technology Category

Application Category

📝 Abstract
The effectiveness of artificial intelligence (AI) in healthcare is significantly hindered by unstructured clinical documentation, which results in noisy, inconsistent, and logically fragmented training data. To address this challenge, we present a knowledge-driven framework that integrates the standardized clinical terminology SNOMED CT with the Neo4j graph database to construct a structured medical knowledge graph. In this graph, clinical entities such as diseases, symptoms, and medications are represented as nodes, and semantic relationships such as ``caused by,'' ``treats,'' and ``belongs to'' are modeled as edges in Neo4j, with types mapped from formal SNOMED CT relationship concepts (e.g., exttt{Causative agent}, exttt{Indicated for}). This design enables multi-hop reasoning and ensures terminological consistency. By extracting and standardizing entity-relationship pairs from clinical texts, we generate structured, JSON-formatted datasets that embed explicit diagnostic pathways. These datasets are used to fine-tune large language models (LLMs), significantly improving the clinical logic consistency of their outputs. Experimental results demonstrate that our knowledge-guided approach enhances the validity and interpretability of AI-generated diagnostic reasoning, providing a scalable solution for building reliable AI-assisted clinical systems.
Problem

Research questions and friction points this paper is trying to address.

Addresses unstructured clinical data hindering AI effectiveness
Constructs SNOMED CT-based knowledge graphs for diagnostic reasoning
Improves LLM output consistency through structured clinical pathways
Innovation

Methods, ideas, or system contributions that make the work stand out.

SNOMED CT-powered knowledge graph with Neo4j database
Multi-hop reasoning with standardized clinical terminology
Fine-tuning LLMs using structured diagnostic pathway datasets
D
Dun Liu
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Guangdong, China
Q
Qin Pang
The Chinese University of Hong Kong, Shenzhen Hospital
G
Guangai Liu
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Guangdong, China
H
Hongyu Mou
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Guangdong, China
J
Jipeng Fan
Chengdu Chengdian Goldisk Health Data Technology Co., Ltd.
Yiming Miao
Yiming Miao
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China
Pin-Han Ho
Pin-Han Ho
University of Waterloo
computer networks
Limei Peng
Limei Peng
Kyungpook National University, South Korea