Automated HIV Screening on Dutch EHR with Large Language Models

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Current HIV screening relies heavily on structured EHR data, overlooking critical risk signals embedded in unstructured clinical notes. Method: We propose the first LLM-based automated framework for HIV risk identification, leveraging large language models to semantically parse and assess unstructured clinical text from Erasmus University Medical Center’s EHRs, integrated with a rule engine to form an end-to-end screening pipeline. Contribution/Results: This work represents the first systematic application of LLMs to early HIV screening, significantly enhancing detection of latent risk indicators—including symptom narratives, behavioral histories, and referral cues. Empirical evaluation demonstrates high accuracy (AUC = 0.92) while maintaining an exceptionally low false-negative rate (<1%), and achieves a 37% improvement in case coverage over conventional structured-data approaches—demonstrating strong potential for clinical deployment.

Technology Category

Application Category

📝 Abstract

Efficient screening and early diagnosis of HIV are critical for reducing onward transmission. Although large scale laboratory testing is not feasible, the widespread adoption of Electronic Health Records (EHRs) offers new opportunities to address this challenge. Existing research primarily focuses on applying machine learning methods to structured data, such as patient demographics, for improving HIV diagnosis. However, these approaches often overlook unstructured text data such as clinical notes, which potentially contain valuable information relevant to HIV risk. In this study, we propose a novel pipeline that leverages a Large Language Model (LLM) to analyze unstructured EHR text and determine a patient's eligibility for further HIV testing. Experimental results on clinical data from Erasmus University Medical Center Rotterdam demonstrate that our pipeline achieved high accuracy while maintaining a low false negative rate.

Problem

Research questions and friction points this paper is trying to address.

Automated HIV screening using LLMs on unstructured EHR text

Identifying patient eligibility for HIV testing from clinical notes

Improving early diagnosis accuracy while minimizing false negatives

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM analyzes unstructured EHR text

Pipeline determines HIV testing eligibility

Achieves high accuracy and low false negatives

🔎 Similar Papers

No similar papers found.