InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts

📅 2024-02-05
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
In high-stakes, high-trust domains such as education and healthcare, existing intrinsically interpretable models—while faithful—produce overly fine-grained explanations that hinder user comprehension and actionable adoption. To address this, we propose a Topic-Driven Global Mixture-of-Experts (Global MoE) neural network, the first to jointly integrate user-controllable thematic feature disentanglement, sparse gating activation, and conditional computation. This unifies explanation fidelity, human interpretability, and predictive performance. Our method natively supports multimodal inputs—including text, time-series, and tabular data—and achieves black-box state-of-the-art accuracy across multiple real-world benchmarks, substantially outperforming existing interpretable baselines. A user study demonstrates that the generated explanations are significantly more actionable and practically valuable.

Technology Category

Application Category

📝 Abstract
Interpretability for neural networks is a trade-off between three key requirements: 1) faithfulness of the explanation (i.e., how perfectly it explains the prediction), 2) understandability of the explanation by humans, and 3) model performance. Most existing methods compromise one or more of these requirements; e.g., post-hoc approaches provide limited faithfulness, automatically identified feature masks compromise understandability, and intrinsically interpretable methods such as decision trees limit model performance. These shortcomings are unacceptable for sensitive applications such as education and healthcare, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability, while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply variations of the InterpretCC architecture for text, time series and tabular data across several real-world benchmarks, demonstrating comparable performance with non-interpretable baselines, outperforming interpretable-by-design baselines, and showing higher actionability and usefulness according to a user study.
Problem

Research questions and friction points this paper is trying to address.

Bridging gap between model accuracy and human-centered explainability
Enhancing actionability of explanations for downstream users
Optimizing interpretability without sacrificing performance in diverse data types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive sparse feature activation for predictions
Global mixture-of-experts with topical subnetworks
User-specified topics for interpretable explanations
🔎 Similar Papers
No similar papers found.
Vinitra Swamy
Vinitra Swamy
EPFL, UC Berkeley, Microsoft AI
Explainable AIAI for education
J
Julian Blackwell
Department of Computer and Communication Sciences, ´Ecole Polytechnique F ´ed´erale de Lausanne, Lausanne, VD, Switzerland
J
Jibril Frej
Department of Computer and Communication Sciences, ´Ecole Polytechnique F ´ed´erale de Lausanne, Lausanne, VD, Switzerland
Martin Jaggi
Martin Jaggi
EPFL
Machine LearningOptimization
T
Tanja Kaser
Department of Computer and Communication Sciences, ´Ecole Polytechnique F ´ed´erale de Lausanne, Lausanne, VD, Switzerland