Trustworthy and Practical AI for Healthcare: A Guided Deferral System with Large Language Models

📅 2024-06-11
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) deployed in healthcare suffer from frequent hallucinations, reliance on proprietary systems, and high privacy-compliance risks. Method: We propose a lightweight, clinically trustworthy human-AI collaboration framework featuring (1) a guided deferred-decision mechanism that automatically escalates uncertain cases to domain experts; (2) the Imbalanced Expected Calibration Error (IECE) metric—specifically designed to evaluate uncertainty calibration under medical class imbalance; and (3) integration of open-source lightweight LLMs, probabilistic uncertainty modeling, and interpretable handover protocols. Results: In real-world clinical pilot deployments, our approach significantly improves report classification accuracy and clinician acceptability while reducing calibration error by 32%. It simultaneously ensures safety, regulatory compliance (e.g., HIPAA/GDPR), and practical deployability—without compromising performance or interpretability.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) offer a valuable technology for various applications in healthcare. However, their tendency to hallucinate and the existing reliance on proprietary systems pose challenges in environments concerning critical decision-making and strict data privacy regulations, such as healthcare, where the trust in such systems is paramount. Through combining the strengths and discounting the weaknesses of humans and AI, the field of Human-AI Collaboration (HAIC) presents one front for tackling these challenges and hence improving trust. This paper presents a novel HAIC guided deferral system that can simultaneously parse medical reports for disorder classification, and defer uncertain predictions with intelligent guidance to humans. We develop methodology which builds efficient, effective and open-source LLMs for this purpose, for the real-world deployment in healthcare. We conduct a pilot study which showcases the effectiveness of our proposed system in practice. Additionally, we highlight drawbacks of standard calibration metrics in imbalanced data scenarios commonly found in healthcare, and suggest a simple yet effective solution: the Imbalanced Expected Calibration Error.
Problem

Research questions and friction points this paper is trying to address.

Enhance trust in healthcare AI systems
Address LLM hallucinations in medical applications
Improve decision-making with Human-AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Guided deferral system
Open-source LLMs
Imbalanced calibration error
🔎 Similar Papers
No similar papers found.