Improving Social Determinants of Health Documentation in French EHRs Using Large Language Models

📅 2025-07-04
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Structured documentation of Social Determinants of Health (SDoH) in French electronic health records (EHRs) is severely inadequate, with ICD-10 coding capturing only 2.8% of relevant cases. Method: This study introduces the first large language model–based (Flan-T5-Large) multi-class SDoH information extraction system for French clinical text, automatically identifying 13 SDoH categories—including temporal attributes and quantitative details—from unstructured clinical notes. It integrates manually annotated social history texts with fine-grained prompt engineering to jointly perform named entity recognition and relation extraction. Contribution/Results: The model achieves a mean F1-score above 0.80 across all SDoH categories and identifies at least one SDoH for 95.8% of patients. We publicly release two high-quality, expert-annotated French SDoH datasets—the first of their kind—addressing the critical gap in non-English clinical NLP resources and enabling scalable public health research and health equity analysis.

Technology Category

Application Category

📝 Abstract
Social determinants of health (SDoH) significantly influence health outcomes, shaping disease progression, treatment adherence, and health disparities. However, their documentation in structured electronic health records (EHRs) is often incomplete or missing. This study presents an approach based on large language models (LLMs) for extracting 13 SDoH categories from French clinical notes. We trained Flan-T5-Large on annotated social history sections from clinical notes at Nantes University Hospital, France. We evaluated the model at two levels: (i) identification of SDoH categories and associated values, and (ii) extraction of detailed SDoH with associated temporal and quantitative information. The model performance was assessed across four datasets, including two that we publicly release as open resources. The model achieved strong performance for identifying well-documented categories such as living condition, marital status, descendants, job, tobacco, and alcohol use (F1 score > 0.80). Performance was lower for categories with limited training data or highly variable expressions, such as employment status, housing, physical activity, income, and education. Our model identified 95.8% of patients with at least one SDoH, compared to 2.8% for ICD-10 codes from structured EHR data. Our error analysis showed that performance limitations were linked to annotation inconsistencies, reliance on English-centric tokenizer, and reduced generalizability due to the model being trained on social history sections only. These results demonstrate the effectiveness of NLP in improving the completeness of real-world SDoH data in a non-English EHR system.
Problem

Research questions and friction points this paper is trying to address.

Extracting SDoH from French clinical notes using LLMs
Improving incomplete SDoH documentation in structured EHRs
Evaluating model performance on diverse SDoH categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Flan-T5-Large for SDoH extraction
Trains on French clinical notes
Evaluates model on multiple datasets
🔎 Similar Papers
No similar papers found.
A
Adrien Bazoge
Data Clinic, University Hospital of Nantes, France
P
PacĂŽme Constant dit Beaufils
Department of Neuroradiology, University Hospital of Nantes, Thorax Institute, France
M
Mohammed Hmitouch
Data Clinic, University Hospital of Nantes, France
R
Romain Bourcier
Department of Neuroradiology, University Hospital of Nantes, Thorax Institute, France
E
Emmanuel Morin
Nantes UniversitĂ©, École Centrale Nantes, CNRS, LS2N, France
Richard Dufour
Richard Dufour
LS2N - TALN/NLP research group - Nantes University
Natural language processingBiomedical domainLanguage modelingSpontaneous speech
B
Béatrice Daille
Nantes UniversitĂ©, École Centrale Nantes, CNRS, LS2N, France
P
Pierre-Antoine Gourraud
Data Clinic, University Hospital of Nantes, France
M
Matilde Karakachoff
Data Clinic, University Hospital of Nantes, France