Beyond Accuracy, SHAP, and Anchors - On the difficulty of designing effective end-user explanations

📅 2025-01-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Contemporary machine learning models’ complexity poses significant trust, regulatory, and ethical risks, yet existing explainability guidelines lack operational specificity. To address this gap, we conducted a controlled experiment with 124 developers, integrating cognitive process theory and sociological imagination to investigate how policy frameworks influence the design of end-user–oriented explanations for diabetic retinopathy screening models. Results reveal that all participants struggled to generate high-quality, policy-compliant, and empirically verifiable explanations; over 70% failed to accurately anticipate users’ comprehension barriers; and widely adopted technical methods (e.g., SHAP, Anchors) exhibit fundamental misalignment with real-world stakeholder needs. Our core contribution is identifying *developers’ inability to empathize with non-technical stakeholders* as the central mechanism underlying explanation failure—and proposing, for the first time, an empathy-centered educational intervention framework to bridge this gap.

Technology Category

Application Category

📝 Abstract

Modern machine learning produces models that are impossible for users or developers to fully understand -- raising concerns about trust, oversight and human dignity. Transparency and explainability methods aim to provide some help in understanding models, but it remains challenging for developers to design explanations that are understandable to target users and effective for their purpose. Emerging guidelines and regulations set goals but may not provide effective actionable guidance to developers. In a controlled experiment with 124 participants, we investigate whether and how specific forms of policy guidance help developers design explanations for an ML-powered screening tool for diabetic retinopathy. Contrary to our expectations, we found that participants across the board struggled to produce quality explanations, comply with the provided policy requirements for explainability, and provide evidence of compliance. We posit that participant noncompliance is in part due to a failure to imagine and anticipate the needs of their audience, particularly non-technical stakeholders. Drawing on cognitive process theory and the sociological imagination to contextualize participants' failure, we recommend educational interventions.

Problem

Research questions and friction points this paper is trying to address.

Developers struggle to design effective end-user explanations for ML models

Policy guidance fails to improve explanation quality and compliance

Participants cannot anticipate non-technical stakeholders' needs in explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale experiment with 124 participants

Investigating policy guidance effectiveness for explanations

Recommending educational interventions based on findings

🔎 Similar Papers

Why do explanations fail? A typology and discussion on failures in XAI