Supporting Intervention Design for Suicide Prevention with Language Model Assistants

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the low efficiency and high cost of manual annotation in structuring free-text narratives from U.S. National Violent Death Reporting System (NVDRS) suicide death reports. We propose a human-in-the-loop, closed-loop learning framework that integrates large language models (LMs) to automatically extract 50 key variables and incorporates expert feedback to iteratively refine annotation guidelines—enabling domain experts to focus on error correction and disagreement analysis. Our contributions include: (1) the first empirical identification that 38% of model prediction disagreements reflect genuine inter-annotator variability, validating their utility for active error detection; (2) an 85% agreement rate between LM predictions and human annotations, with newly extracted variables achieving quality comparable to fully manual annotation. The framework significantly enhances efficiency, accuracy, and interpretability in processing sensitive textual data, offering a scalable, high-fidelity, low-burden paradigm for structured data curation in public health.

Technology Category

Application Category

📝 Abstract
Warning: This paper discusses topics of suicide and suicidal ideation, which may be distressing to some readers. The National Violent Death Reporting System (NVDRS) documents information about suicides in the United States, including free text narratives (e.g., circumstances surrounding a suicide). In a demanding public health data pipeline, annotators manually extract structured information from death investigation records following extensive guidelines developed painstakingly by experts. In this work, we facilitate data-driven insights from the NVDRS data to support the development of novel suicide interventions by investigating the value of language models (LMs) as efficient assistants to these (a) data annotators and (b) experts. We find that LM predictions match existing data annotations about 85% of the time across 50 NVDRS variables. In the cases where the LM disagrees with existing annotations, expert review reveals that LM assistants can surface annotation discrepancies 38% of the time. Finally, we introduce a human-in-the-loop algorithm to assist experts in efficiently building and refining guidelines for annotating new variables by allowing them to focus only on providing feedback for incorrect LM predictions. We apply our algorithm to a real-world case study for a new variable that characterizes victim interactions with lawyers and demonstrate that it achieves comparable annotation quality with a laborious manual approach. Our findings provide evidence that LMs can serve as effective assistants to public health researchers who handle sensitive data in high-stakes scenarios.
Problem

Research questions and friction points this paper is trying to address.

Using language models to assist suicide prevention data annotation
Identifying annotation discrepancies in sensitive suicide data
Developing human-in-the-loop algorithms for guideline refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language models assist suicide data annotation
Human-in-loop algorithm refines annotation guidelines
LM predictions achieve 85% annotation accuracy
🔎 Similar Papers
No similar papers found.
Jaspreet Ranjit
Jaspreet Ranjit
University of Southern California
Natural Language Processing
H
Hyundong J. Cho
Information Sciences Institute, University of Southern California
C
Claire J. Smerdon
Thomas Lord Dept. of Computer Science, University of Southern California
Yoonsoo Nam
Yoonsoo Nam
University of Oxford
theory of machine learning
M
Myles Phung
Thomas Lord Dept. of Computer Science, University of Southern California
Jonathan May
Jonathan May
University of Southern California, Information Sciences Institute
Machine TranslationMachine LearningNatural Language Processing
J
John R. Blosnich
Suzanne-Dwork School of Social Work, University of Southern California
Swabha Swayamdipta
Swabha Swayamdipta
University of Southern California
Natural Language ProcessingMachine Learning