Patient-level Information Extraction by Consistent Integration of Textual and Tabular Evidence with Bayesian Networks

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of integrating structured electronic health record (EHR) data (e.g., diagnosis codes, lab values) with unstructured clinical text (e.g., discharge summaries, nursing notes) and poor model interpretability. We propose a neuro-Bayesian multimodal fusion framework: (1) a domain-informed Bayesian network encodes clinical relationships; (2) a neural text classifier extracts semantic features from clinical narratives; and (3) novel “consistency nodes” coupled with virtual evidence enable cross-modal probabilistic alignment and robust inference under missing data. Evaluated on the SimSUM synthetic benchmark, our method significantly improves predictive calibration, inter-modal consistency, and reliability. Our key contribution is the first integration of consistency nodes into a neuro-Bayesian architecture—uniquely balancing transparency, clinical plausibility, and predictive performance—thereby establishing a new paradigm for interpretable, trustworthy clinical AI.

Technology Category

Application Category

📝 Abstract
Electronic health records (EHRs) form an invaluable resource for training clinical decision support systems. To leverage the potential of such systems in high-risk applications, we need large, structured tabular datasets on which we can build transparent feature-based models. While part of the EHR already contains structured information (e.g. diagnosis codes, medications, and lab results), much of the information is contained within unstructured text (e.g. discharge summaries and nursing notes). In this work, we propose a method for multi-modal patient-level information extraction that leverages both the tabular features available in the patient's EHR (using an expert-informed Bayesian network) as well as clinical notes describing the patient's symptoms (using neural text classifiers). We propose the use of virtual evidence augmented with a consistency node to provide an interpretable, probabilistic fusion of the models' predictions. The consistency node improves the calibration of the final predictions compared to virtual evidence alone, allowing the Bayesian network to better adjust the neural classifier's output to handle missing information and resolve contradictions between the tabular and text data. We show the potential of our method on the SimSUM dataset, a simulated benchmark linking tabular EHRs with clinical notes through expert knowledge.
Problem

Research questions and friction points this paper is trying to address.

Extracting structured patient data from unstructured clinical text notes
Integrating tabular EHR features with textual evidence using Bayesian networks
Resolving contradictions between structured and unstructured medical data sources
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian network integrates tabular and text data
Virtual evidence with consistency node improves calibration
Probabilistic fusion handles missing information and contradictions
🔎 Similar Papers
No similar papers found.
P
Paloma Rabaey
Faculty of Engineering and Architecture, Ghent University, Ghent, Belgium.
A
Adrick Tench
Faculty of Engineering and Architecture, Ghent University, Ghent, Belgium.
S
Stefan Heytens
Ghent University Hospital, Ghent, Belgium.
Thomas Demeester
Thomas Demeester
Associate professor, Ghent University - imec
Artificial IntelligenceNatural Language Processing(past: electromagnetics)