A Multi-Stage Large Language Model Framework for Extracting Suicide-Related Social Determinants of Health

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses key challenges in extracting suicide-related social determinants of health (SDoH) from unstructured text: low extraction accuracy, severe long-tail distribution, weak temporal identification of critical stressors, and poor model interpretability. To tackle these, we propose a multi-stage large language model (LLM) framework integrating fine-grained contextual retrieval, explicit intermediate reasoning chain modeling, and task-oriented fine-tuning—balancing precision and transparency. The framework synergistically leverages BioBERT and DeepSeek-R1, augmented by prompt engineering and knowledge distillation into lightweight models to optimize performance-efficiency trade-offs. Experiments demonstrate significant improvements over baselines in both SDoH extraction and contextual recall. User evaluation confirms that our interpretable outputs enhance annotation efficiency by 23.6% and accuracy by 18.4%, providing reliable, trustworthy data support for early identification and intervention targeting high-risk individuals.

Technology Category

Application Category

📝 Abstract
Background: Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability. Methods: We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study. Results: We show that our proposed framework demonstrated performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability. Conclusions: Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.
Problem

Research questions and friction points this paper is trying to address.

Extracting suicide-related social determinants from unstructured text
Addressing long-tailed factor distributions and model explainability
Improving early risk identification and prevention strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage LLM framework for SDoH extraction
Fine-tuning smaller models for cost efficiency
Intermediate explanations enhance model transparency
🔎 Similar Papers
No similar papers found.
S
Song Wang
Cockrell School of Engineering, The University of Texas at Austin, Austin, Texas, USA
Y
Yishu Wei
Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
H
Haotian Ma
Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
M
Max Lovitt
Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
K
Kelly Deng
Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
Y
Yuan Meng
Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
Zihan Xu
Zihan Xu
Arizona State University
Machine LearningNeuromorphic ComputingMemory
J
Jingze Zhang
Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
Yunyu Xiao
Yunyu Xiao
Weill Cornell Medicine | NewYork-Presbyterian. Department of Population Health Sciences | Health
SuicideMental HealthHealth DisparitiesHealth Data Science
Ying Ding
Ying Ding
Bill & Lewis Suit Professor, School of Information, Dell Med, University of Texas at Austin
AI in HealthKnowledge GraphScience of Science
Xuhai Xu
Xuhai Xu
Assistant Professor, Columbia University | Google
Human-Computer InteractionUbiquitous ComputingHuman-Centered AImHealthHealth Informatics
Joydeep Ghosh
Joydeep Ghosh
(Chaired) Professor, ECE Dept., Univ. Texas at Austin; Faculty Dell Med, UT-Comp. Sc., McCombs
Machine LearningData MiningEthical AIPersonalizationAI/ML for Healthcare
Y
Yifan Peng
Population Health Sciences, Weill Cornell Medicine, New York, New York, USA