🤖 AI Summary
Drug overdose deaths in the United States—particularly those involving fentanyl—are rising steadily, yet critical cause-of-death information remains buried in unstructured autopsy reports; ICD-10 coding is delayed and prone to misclassification. To address this, we propose a domain-adapted language model approach—fine-tuning BioClinicalBERT—for multi-label drug identification, enabling precise extraction of substances involved from free-text autopsy reports. Unlike conventional machine learning methods or general-purpose large language models, our method demonstrates unprecedented robustness in cross-year external validation, achieving a macro-F1 score of 0.966 (internal test ≥ 0.998), substantially outperforming existing techniques. This advancement enables near real-time surveillance of illicit drug trends, delivering timely, high-fidelity data to inform public health interventions with minimal information loss.
📝 Abstract
The rising rate of drug-related deaths in the United States, largely driven by fentanyl, requires timely and accurate surveillance. However, critical overdose data are often buried in free-text coroner reports, leading to delays and information loss when coded into ICD (International Classification of Disease)-10 classifications. Natural language processing (NLP) models may automate and enhance overdose surveillance, but prior applications have been limited. A dataset of 35,433 death records from multiple U.S. jurisdictions in 2020 was used for model training and internal testing. External validation was conducted using a novel separate dataset of 3,335 records from 2023-2024. Multiple NLP approaches were evaluated for classifying specific drug involvement from unstructured death certificate text. These included traditional single- and multi-label classifiers, as well as fine-tuned encoder-only language models such as Bidirectional Encoder Representations from Transformers (BERT) and BioClinicalBERT, and contemporary decoder-only large language models such as Qwen 3 and Llama 3. Model performance was assessed using macro-averaged F1 scores, and 95% confidence intervals were calculated to quantify uncertainty. Fine-tuned BioClinicalBERT models achieved near-perfect performance, with macro F1 scores >=0.998 on the internal test set. External validation confirmed robustness (macro F1=0.966), outperforming conventional machine learning, general-domain BERT models, and various decoder-only large language models. NLP models, particularly fine-tuned clinical variants like BioClinicalBERT, offer a highly accurate and scalable solution for overdose death classification from free-text reports. These methods can significantly accelerate surveillance workflows, overcoming the limitations of manual ICD-10 coding and supporting near real-time detection of emerging substance use trends.