A Multi-Stage Large Language Model Framework for Extracting Suicide-Related Social Determinants of Health

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study addresses key challenges in extracting suicide-related social determinants of health (SDoH) from unstructured text: low extraction accuracy, severe long-tail distribution, weak temporal identification of critical stressors, and poor model interpretability. To tackle these, we propose a multi-stage large language model (LLM) framework integrating fine-grained contextual retrieval, explicit intermediate reasoning chain modeling, and task-oriented fine-tuning—balancing precision and transparency. The framework synergistically leverages BioBERT and DeepSeek-R1, augmented by prompt engineering and knowledge distillation into lightweight models to optimize performance-efficiency trade-offs. Experiments demonstrate significant improvements over baselines in both SDoH extraction and contextual recall. User evaluation confirms that our interpretable outputs enhance annotation efficiency by 23.6% and accuracy by 18.4%, providing reliable, trustworthy data support for early identification and intervention targeting high-risk individuals.

Technology Category

Application Category

📝 Abstract

Background: Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability. Methods: We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study. Results: We show that our proposed framework demonstrated performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability. Conclusions: Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.

Problem

Research questions and friction points this paper is trying to address.

Extracting suicide-related social determinants from unstructured text

Addressing long-tailed factor distributions and model explainability

Improving early risk identification and prevention strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage LLM framework for SDoH extraction

Fine-tuning smaller models for cost efficiency

Intermediate explanations enhance model transparency

🔎 Similar Papers

No similar papers found.