🤖 AI Summary
This study investigates the differential impact and potential risks of eXplainable Artificial Intelligence (XAI) in dermatological diagnosis for laypersons versus primary-care clinicians. We integrate a fairness-aware dermatology AI model, a multimodal large language model (LLM)-based explainer, and human-centered experimental design to conduct large-scale user studies, quantifying automation bias and decision fairness. Our key contributions are threefold: (1) We identify, for the first time, a proficiency-dependent bidirectional effect of XAI—significantly increasing public trust while exacerbating automation bias (23% accuracy drop when AI errs), yet improving diagnostic robustness among clinicians; (2) We demonstrate that explanation timing is critical—presenting the AI recommendation before the explanation significantly degrades judgment quality in erroneous cases; (3) We empirically validate that XAI assistance improves cross-skin-tone diagnostic accuracy and reduces demographic performance disparities. These findings provide crucial human factors evidence and actionable design principles for responsible clinical deployment of XAI.
📝 Abstract
Artificial intelligence (AI) is increasingly permeating healthcare, from physician assistants to consumer applications. Since AI algorithm's opacity challenges human interaction, explainable AI (XAI) addresses this by providing AI decision-making insight, but evidence suggests XAI can paradoxically induce over-reliance or bias. We present results from two large-scale experiments (623 lay people; 153 primary care physicians, PCPs) combining a fairness-based diagnosis AI model and different XAI explanations to examine how XAI assistance, particularly multimodal large language models (LLMs), influences diagnostic performance. AI assistance balanced across skin tones improved accuracy and reduced diagnostic disparities. However, LLM explanations yielded divergent effects: lay users showed higher automation bias - accuracy boosted when AI was correct, reduced when AI erred - while experienced PCPs remained resilient, benefiting irrespective of AI accuracy. Presenting AI suggestions first also led to worse outcomes when the AI was incorrect for both groups. These findings highlight XAI's varying impact based on expertise and timing, underscoring LLMs as a "double-edged sword" in medical AI and informing future human-AI collaborative system design.