Enhancing LLMs for Identifying and Prioritizing Important Medical Jargons from Electronic Health Record Notes Utilizing Data Augmentation

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of identifying and prioritizing medical terms in patient-readable electronic health record (EHR) notes under low-resource conditions. Method: We propose a systematic comparative framework—first unifying evaluation of prompt engineering (structured and few-shot prompting), LoRA fine-tuning, and ChatGPT-driven data augmentation for medical term extraction—and integrate learning-to-rank for importance scoring, validated via five-fold cross-validation. Contribution/Results: Data augmentation substantially boosts open-source model performance: Mistral-7B achieves an MRR of 0.746 post-augmentation—surpassing GPT-4 Turbo—while GPT-4 Turbo attains the highest F1 score (0.433). Critically, optimal F1 and MRR strategies are decoupled, indicating that term identification and ranking require distinct modeling approaches. Our work establishes a new paradigm for lightweight, interpretable, patient-facing EHR understanding.

Technology Category

Application Category

📝 Abstract
Objective: OpenNotes enables patients to access EHR notes, but medical jargon can hinder comprehension. To improve understanding, we evaluated closed- and open-source LLMs for extracting and prioritizing key medical terms using prompting, fine-tuning, and data augmentation. Materials and Methods: We assessed LLMs on 106 expert-annotated EHR notes, experimenting with (i) general vs. structured prompts, (ii) zero-shot vs. few-shot prompting, (iii) fine-tuning, and (iv) data augmentation. To enhance open-source models in low-resource settings, we used ChatGPT for data augmentation and applied ranking techniques. We incrementally increased the augmented dataset size (10 to 10,000) and conducted 5-fold cross-validation, reporting F1 score and Mean Reciprocal Rank (MRR). Results and Discussion: Fine-tuning and data augmentation improved performance over other strategies. GPT-4 Turbo achieved the highest F1 (0.433), while Mistral7B with data augmentation had the highest MRR (0.746). Open-source models, when fine-tuned or augmented, outperformed closed-source models. Notably, the best F1 and MRR scores did not always align. Few-shot prompting outperformed zero-shot in vanilla models, and structured prompts yielded different preferences across models. Fine-tuning improved zero-shot performance but sometimes degraded few-shot performance. Data augmentation performed comparably or better than other methods. Conclusion: Our evaluation highlights the effectiveness of prompting, fine-tuning, and data augmentation in improving model performance for medical jargon extraction in low-resource scenarios.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs for medical jargon identification
Prioritize key terms from EHR notes
Utilize data augmentation for low-resource settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilized data augmentation for LLMs enhancement
Applied fine-tuning to improve model accuracy
Implemented structured prompts for better outcomes
🔎 Similar Papers
No similar papers found.
Won Seok Jang
Won Seok Jang
PhD student, University of Massachusetts Lowell
natural language processinghealthcare
S
Sharmin Sultana
Miner School of Computer and Information Sciences, UMass Lowell, MA, USA
Zonghai Yao
Zonghai Yao
Umass Amherst
Medical-LLMMulti-agent AI HospitalClinical ReasoningSynthetic DataPatient Education
Hieu Tran
Hieu Tran
University of Maryland, College Park
Natural Language ProcessingLarge Language Models
Z
Zhichao Yang
Manning College of Information and Computer Sciences, UMass Amherst, MA, USA
Sunjae Kwon
Sunjae Kwon
Umass Amherst
Machine LearningNatural Language ProcessingLexical SemanticsPublic HealthAi in Healthcare
H
Hong Yu
Miner School of Computer and Information Sciences, UMass Lowell, MA, USA; Center for Healthcare Organization and Implementation Research, V A Bedford Health Care, MA, USA