Reducing Large Language Model Safety Risks in Women's Health using Semantic Entropy

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the risk of hallucination generation by large language models (LLMs) in obstetrics and gynecology—posing potential threats to maternal and neonatal safety—this study proposes a semantic entropy–based uncertainty quantification method, pioneering semantic-level uncertainty modeling for AI applications in women’s health. Unlike conventional approaches relying on token-level probabilities or perplexity, our method detects hallucinations by modeling semantic consistency across model outputs. Evaluated on the UK RCOG MRCOG clinical validation set, it achieves an AUROC of 0.76—significantly outperforming perplexity-based detection (AUROC = 0.62). With expert-annotated ground truth from obstetricians and gynecologists, the method attains an AUROC of 0.97. Integrated semantic clustering and human evaluation further confirm its efficacy in enhancing output reliability and safety of LLMs in high-stakes clinical decision-making scenarios.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) hold substantial promise for clinical decision support. However, their widespread adoption in medicine, particularly in healthcare, is hindered by their propensity to generate false or misleading outputs, known as hallucinations. In high-stakes domains such as women's health (obstetrics&gynaecology), where errors in clinical reasoning can have profound consequences for maternal and neonatal outcomes, ensuring the reliability of AI-generated responses is critical. Traditional methods for quantifying uncertainty, such as perplexity, fail to capture meaning-level inconsistencies that lead to misinformation. Here, we evaluate semantic entropy (SE), a novel uncertainty metric that assesses meaning-level variation, to detect hallucinations in AI-generated medical content. Using a clinically validated dataset derived from UK RCOG MRCOG examinations, we compared SE with perplexity in identifying uncertain responses. SE demonstrated superior performance, achieving an AUROC of 0.76 (95% CI: 0.75-0.78), compared to 0.62 (0.60-0.65) for perplexity. Clinical expert validation further confirmed its effectiveness, with SE achieving near-perfect uncertainty discrimination (AUROC: 0.97). While semantic clustering was successful in only 30% of cases, SE remains a valuable tool for improving AI safety in women's health. These findings suggest that SE could enable more reliable AI integration into clinical practice, particularly in resource-limited settings where LLMs could augment care. This study highlights the potential of SE as a key safeguard in the responsible deployment of AI-driven tools in women's health, leading to safer and more effective digital health interventions.
Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in AI-generated medical content
Ensuring reliability of AI in women's health decisions
Improving AI safety using semantic entropy metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic entropy detects AI-generated misinformation effectively.
Superior performance over perplexity in uncertainty identification.
Enhances AI safety in women's health applications.
🔎 Similar Papers
No similar papers found.
J
Jahan C. Penny-Dimri
Oxford Digital Health Labs, Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK.
M
Magdalena Bachmann
Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK.
William Cooke
William Cooke
University of Oxford
S
Sam Mathewlynn
Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK.
Samuel Dockree
Samuel Dockree
Medical doctor, Oxford University Hospitals
Obstetrics and Gynaecology
J
John Tolladay
Oxford Digital Health Labs, Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK.
Jannik Kossen
Jannik Kossen
FAIR, Meta
L
Lin Li
OATML, Department of Computer Science, University of Oxford, Oxford, UK.
Yarin Gal
Yarin Gal
Professor of Machine Learning, University of Oxford
Machine LearningArtificial IntelligenceProbability TheoryStatistics
Gabriel Davis Jones
Gabriel Davis Jones
University of Oxford
Maternal and Neonatal HealthNeuroscienceComputer ScienceArtifical IntelligenceGlobal Health