🤖 AI Summary
This work addresses the challenge that social determinants of health (SDOH) are predominantly embedded in unstructured clinical text, hindering their direct use in computational analysis. To overcome this, the authors propose a lightweight prompt engineering approach that integrates guideline-driven concise prompts, carefully selected few-shot examples, self-consistency decoding, and post-processing quality control to enable efficient extraction of structured SDOH events using reasoning-based large language models. The method achieves a micro-averaged F1 score of 0.866 on standard benchmarks—comparable to state-of-the-art models—while substantially reducing implementation complexity. This demonstrates a compelling balance between streamlining the extraction pipeline and preserving high accuracy, offering a practical and scalable solution for SDOH information extraction from clinical narratives.
📝 Abstract
Social Determinants of Health (SDOH) refer to environmental, behavioral, and social conditions that influence how individuals live, work, and age. SDOH have a significant impact on personal health outcomes, and their systematic identification and management can yield substantial improvements in patient care. However, SDOH information is predominantly captured in unstructured clinical notes within electronic health records, which limits its direct use as machine-readable entities. To address this issue, researchers have employed Natural Language Processing (NLP) techniques using pre-trained BERT-based models, demonstrating promising performance but requiring sophisticated implementation and extensive computational resources. In this study, we investigated prompt engineering strategies for extracting structured SDOH events utilizing LLMs with advanced reasoning capabilities. Our method consisted of four modules: 1) developing concise and descriptive prompts integrated with established guidelines, 2) applying few-shot learning with carefully curated examples, 3) using a self-consistency mechanism to ensure robust outputs, and 4) post-processing for quality control. Our approach achieved a micro-F1 score of 0.866, demonstrating competitive performance compared to the leading models. The results demonstrated that LLMs with reasoning capabilities are effective solutions for SDOH event extraction, offering both implementation simplicity and strong performance.