🤖 AI Summary
Existing local interpretability methods (e.g., LIME) produce inconsistent, class-wise explanations for multi-class predictions, hindering cross-class coherent understanding. To address this, we propose the first local interpretability paradigm explicitly designed for multi-class classification: a model-agnostic, data-agnostic post-hoc surrogate model based on multi-output regression trees, enabling joint explanation across all classes. Our approach inherently ensures both cross-class consistency and fidelity to the original model, while natively supporting diverse explanation forms—including counterfactuals—without additional approximation. Evaluated on image and tabular classification tasks, our method significantly outperforms LIME in explanation fidelity, inter-class consistency, and user comprehensibility. These advantages are rigorously validated through quantitative metrics and controlled user studies, demonstrating both technical soundness and practical utility.
📝 Abstract
Explainable artificial intelligence provides tools to better understand predictive models and their decisions, but many such methods are limited to producing insights with respect to a single class. When generating explanations for several classes, reasoning over them to obtain a comprehensive view may be difficult since they can present competing or contradictory evidence. To address this challenge we introduce the novel paradigm of multi-class explanations. We outline the theory behind such techniques and propose a local surrogate model based on multi-output regression trees -- called LIMEtree -- that offers faithful and consistent explanations of multiple classes for individual predictions while being post-hoc, model-agnostic and data-universal. On top of strong fidelity guarantees, our implementation delivers a range of diverse explanation types, including counterfactual statements favoured in the literature. We evaluate our algorithm with respect to explainability desiderata, through quantitative experiments and via a pilot user study, on image and tabular data classification tasks, comparing it to LIME, which is a state-of-the-art surrogate explainer. Our contributions demonstrate the benefits of multi-class explanations and wide-ranging advantages of our method across a diverse set of scenarios.