Explanations Go Linear: Interpretable and Individual Latent Encoding for Post-hoc Explainability

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Addressing the challenge of balancing local fidelity and global consistency in post-hoc explanations for black-box models, this paper proposes ILLUME: a framework that employs a meta-encoder to learn transferable latent representations and adaptively generates instance-specific linear transformations, thereby unifying local sensitivity modeling with global computational efficiency. Its plug-and-play multi-agent model ensemble architecture enables flexible deployment. Compared to LIME, SHAP, and global surrogate methods, ILLUME achieves significant improvements across multiple benchmark datasets—enhancing attribution faithfulness (+12.3%), stability (+18.7%), and human interpretability—while uniquely breaking the traditional robustness–interpretability trade-off barrier for the first time.

Technology Category

Application Category

📝 Abstract

Post-hoc explainability is essential for understanding black-box machine learning models. Surrogate-based techniques are widely used for local and global model-agnostic explanations but have significant limitations. Local surrogates capture non-linearities but are computationally expensive and sensitive to parameters, while global surrogates are more efficient but struggle with complex local behaviors. In this paper, we present ILLUME, a flexible and interpretable framework grounded in representation learning, that can be integrated with various surrogate models to provide explanations for any black-box classifier. Specifically, our approach combines a globally trained surrogate with instance-specific linear transformations learned with a meta-encoder to generate both local and global explanations. Through extensive empirical evaluations, we demonstrate the effectiveness of ILLUME in producing feature attributions and decision rules that are not only accurate but also robust and faithful to the black-box, thus providing a unified explanation framework that effectively addresses the limitations of traditional surrogate methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of surrogate-based post-hoc explainability methods

Combines global and local explanations for black-box models

Provides robust and faithful feature attributions and decision rules

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines global surrogate with linear transformations

Uses meta-encoder for instance-specific explanations

Provides unified local and global explanation framework

🔎 Similar Papers

No similar papers found.