Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
📝 Abstract
Clinical named entity recognition from dental progress notes is challenging because documentation is highly unstructured, domain-specific, and often privacy-sensitive. We developed a locally deployable framework that enables small language models to self-generate, verify, refine, and evaluate entity-specific prompts for extracting multiple clinical entities from dental notes. Using 1,200 annotated notes, we evaluated candidate open-weight models with multi-prompt ensemble inference and further adapted selected models using QLoRA-based supervised fine-tuning and direct preference optimization. Model performance varied substantially, highlighting the need for task-specific evaluation rather than reliance on generic benchmarks. Qwen2.5-14B-Instruct achieved the strongest baseline performance. After DPO, Qwen2.5-14B-Instruct and Llama-3.1-8B-Instruct achieved micro/macro F1 scores of 0.864/0.837 and 0.806/0.797, respectively. These findings suggest that automated prompt optimization combined with lightweight preference-based post-training can support scalable clinical information extraction using locally deployed small language models.
Problem

Research questions and friction points this paper is trying to address.

clinical named entity recognition
privacy-sensitive
dental progress notes
unstructured text
clinical information extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-prompting
privacy-preserving
clinical information extraction
direct preference optimization (DPO)
small language models
🔎 Similar Papers
No similar papers found.
Y
Yao-Shun Chuang
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
T
Tushti Mody
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
U
Uday Pratap Singh
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
S
Shirindokht Shiraz
School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
C
Chun-Teh Lee
School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77054, United States
R
Ryan Brandon
Willamette Dental and Skourtes Institute, Hillsboro, OR 97123, United States
M
Muhammad F Walji
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Xiaoqian Jiang
Xiaoqian Jiang
McWilliams School of Biomedical Informatics, UTHealth
predictive modelinghealthcare privacy
B
Bunmi Tokede
School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77054, United States