🤖 AI Summary
In high-stakes domains, inconsistent attributional explanations undermine trust and fairness. Method: We propose EXAGREE—the first systematic framework for explanation consistency—by (i) formally characterizing four types of ranking-based explanation discrepancies; (ii) modeling attributional diversity via the Rashomon set; and (iii) introducing a stakeholder-driven multi-objective optimization paradigm that jointly optimizes predictive performance, cross-group fairness, and explanation alignment, yielding Stakeholder-Aligned Explanation Models (SAEMs). Contribution/Results: We design a stakeholder-driven explanation alignment algorithm and a hybrid evaluation protocol integrating synthetic and real-world data. Across diverse benchmark datasets, EXAGREE significantly reduces explanation inconsistency (average reduction of 37.2%) and improves inter-subgroup explanation fairness (ΔStatistical Parity Difference ↓0.19), delivering the first deployable, explanation-coordination tool for trustworthy AI.
📝 Abstract
Explanations in machine learning are critical for trust, transparency, and fairness. Yet, complex disagreements among these explanations limit the reliability and applicability of machine learning models, especially in high-stakes environments. We formalize four fundamental ranking-based explanation disagreement problems and introduce a novel framework, EXplanation AGREEment (EXAGREE), to bridge diverse interpretations in explainable machine learning, particularly from stakeholder-centered perspectives. Our approach leverages a Rashomon set for attribution predictions and then optimizes within this set to identify Stakeholder-Aligned Explanation Models (SAEMs) that minimize disagreement with diverse stakeholder needs while maintaining predictive performance. Rigorous empirical analysis on synthetic and real-world datasets demonstrates that EXAGREE reduces explanation disagreement and improves fairness across subgroups in various domains. EXAGREE not only provides researchers with a new direction for studying explanation disagreement problems but also offers data scientists a tool for making better-informed decisions in practical applications.