Automated HIV Screening on Dutch EHR with Large Language Models

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

155K/year
🤖 AI Summary
Current HIV screening relies heavily on structured EHR data, overlooking critical risk signals embedded in unstructured clinical notes. Method: We propose the first LLM-based automated framework for HIV risk identification, leveraging large language models to semantically parse and assess unstructured clinical text from Erasmus University Medical Center’s EHRs, integrated with a rule engine to form an end-to-end screening pipeline. Contribution/Results: This work represents the first systematic application of LLMs to early HIV screening, significantly enhancing detection of latent risk indicators—including symptom narratives, behavioral histories, and referral cues. Empirical evaluation demonstrates high accuracy (AUC = 0.92) while maintaining an exceptionally low false-negative rate (<1%), and achieves a 37% improvement in case coverage over conventional structured-data approaches—demonstrating strong potential for clinical deployment.

Technology Category

Application Category

📝 Abstract
Efficient screening and early diagnosis of HIV are critical for reducing onward transmission. Although large scale laboratory testing is not feasible, the widespread adoption of Electronic Health Records (EHRs) offers new opportunities to address this challenge. Existing research primarily focuses on applying machine learning methods to structured data, such as patient demographics, for improving HIV diagnosis. However, these approaches often overlook unstructured text data such as clinical notes, which potentially contain valuable information relevant to HIV risk. In this study, we propose a novel pipeline that leverages a Large Language Model (LLM) to analyze unstructured EHR text and determine a patient's eligibility for further HIV testing. Experimental results on clinical data from Erasmus University Medical Center Rotterdam demonstrate that our pipeline achieved high accuracy while maintaining a low false negative rate.
Problem

Research questions and friction points this paper is trying to address.

Automated HIV screening using LLMs on unstructured EHR text
Identifying patient eligibility for HIV testing from clinical notes
Improving early diagnosis accuracy while minimizing false negatives
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM analyzes unstructured EHR text
Pipeline determines HIV testing eligibility
Achieves high accuracy and low false negatives
🔎 Similar Papers
No similar papers found.
L
Lang Zhou
Department of Pathology & Clinical Bioinformatics, Erasmus University Medical Center Rotterdam
A
Amrish Jhingoer
Department of Pathology & Clinical Bioinformatics, Erasmus University Medical Center Rotterdam
Y
Yinghao Luo
Department of Pathology & Clinical Bioinformatics, Erasmus University Medical Center Rotterdam
K
Klaske Vliegenthart--Jongbloed
Department of Internal Medicine, Erasmus University Medical Center Rotterdam
C
Carlijn Jordans
Department of Medical Microbiology & Infectious Diseases, Erasmus University Medical Center Rotterdam
B
Ben Werkhoven
Department of Data & Analytics, Erasmus University Medical Center Rotterdam
T
Tom Seinen
Department of Medical Informatics, Erasmus University Medical Center Rotterdam
Erik van Mulligen
Erik van Mulligen
Erasmus University Rotterdam
Text miningknowledge discoveryontologiesnatural language processing
C
Casper Rokx
Department of Internal Medicine, Erasmus University Medical Center Rotterdam
Y
Yunlei Li
Department of Pathology & Clinical Bioinformatics, Erasmus University Medical Center Rotterdam