EXAGREE: Towards Explanation Agreement in Explainable Machine Learning

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

187K/year

🤖 AI Summary

In high-stakes domains, inconsistent attributional explanations undermine trust and fairness. Method: We propose EXAGREE—the first systematic framework for explanation consistency—by (i) formally characterizing four types of ranking-based explanation discrepancies; (ii) modeling attributional diversity via the Rashomon set; and (iii) introducing a stakeholder-driven multi-objective optimization paradigm that jointly optimizes predictive performance, cross-group fairness, and explanation alignment, yielding Stakeholder-Aligned Explanation Models (SAEMs). Contribution/Results: We design a stakeholder-driven explanation alignment algorithm and a hybrid evaluation protocol integrating synthetic and real-world data. Across diverse benchmark datasets, EXAGREE significantly reduces explanation inconsistency (average reduction of 37.2%) and improves inter-subgroup explanation fairness (ΔStatistical Parity Difference ↓0.19), delivering the first deployable, explanation-coordination tool for trustworthy AI.

Technology Category

Application Category

📝 Abstract

Explanations in machine learning are critical for trust, transparency, and fairness. Yet, complex disagreements among these explanations limit the reliability and applicability of machine learning models, especially in high-stakes environments. We formalize four fundamental ranking-based explanation disagreement problems and introduce a novel framework, EXplanation AGREEment (EXAGREE), to bridge diverse interpretations in explainable machine learning, particularly from stakeholder-centered perspectives. Our approach leverages a Rashomon set for attribution predictions and then optimizes within this set to identify Stakeholder-Aligned Explanation Models (SAEMs) that minimize disagreement with diverse stakeholder needs while maintaining predictive performance. Rigorous empirical analysis on synthetic and real-world datasets demonstrates that EXAGREE reduces explanation disagreement and improves fairness across subgroups in various domains. EXAGREE not only provides researchers with a new direction for studying explanation disagreement problems but also offers data scientists a tool for making better-informed decisions in practical applications.

Problem

Research questions and friction points this paper is trying to address.

Addresses conflicting explanations from different ML attribution methods

Selects stakeholder-aligned models to maximize explanation agreement

Improves faithfulness, plausibility and fairness while maintaining accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework selects stakeholder-aligned explanation models

Unifies faithfulness and plausibility with single agreement metric

Differentiable attribution network enables gradient-based constrained search

🔎 Similar Papers

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective