Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of constructing interoperable patient digital twins from unstructured electronic health records (EHRs), which is hindered by clinical text heterogeneity and the lack of standardized mappings. The authors propose the first end-to-end semantic natural language processing (NLP) pipeline that tightly integrates with the Fast Healthcare Interoperability Resources (FHIR) standard. By combining named entity recognition, concept normalization to SNOMED-CT and ICD-10 terminologies, and relation extraction, the pipeline automatically transforms free-text clinical notes into structured FHIR resources. Evaluated on the MIMIC-IV Clinical Database Demo, the approach significantly improves F1 scores for both entity and relation extraction, outperforms baseline methods in schema completeness and system interoperability, and enables the automated construction of patient digital twins with high semantic consistency.

Technology Category

Application Category

📝 Abstract
Digital twins -- virtual replicas of physical entities -- are gaining traction in healthcare for personalized monitoring, predictive modeling, and clinical decision support. However, generating interoperable patient digital twins from unstructured electronic health records (EHRs) remains challenging due to variability in clinical documentation and lack of standardized mappings. This paper presents a semantic NLP-driven pipeline that transforms free-text EHR notes into FHIR-compliant digital twin representations. The pipeline leverages named entity recognition (NER) to extract clinical concepts, concept normalization to map entities to SNOMED-CT or ICD-10, and relation extraction to capture structured associations between conditions, medications, and observations. Evaluation on MIMIC-IV Clinical Database Demo with validation against MIMIC-IV-on-FHIR reference mappings demonstrates high F1-scores for entity and relation extraction, with improved schema completeness and interoperability compared to baseline methods.
Problem

Research questions and friction points this paper is trying to address.

digital twins
unstructured EHRs
interoperability
semantic NLP
FHIR
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic NLP pipeline
patient digital twin
FHIR interoperability
clinical concept normalization
relation extraction
🔎 Similar Papers
2024-05-27International Conference on Information and Knowledge ManagementCitations: 4
R
Rafael Brens
Binghamton University
Y
Yuqiao Meng
Binghamton University
L
Luoxi Tang
Binghamton University
Zhaohan Xi
Zhaohan Xi
Binghamton University
AI for ScienceLarge Language ModelsHealthcare AICybersecurityAI Security