Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses key challenges in generating minimal sufficient reasons—i.e., the smallest feature subsets guaranteeing invariant model predictions—for neural networks: high computational cost, reliance on out-of-distribution sampling, and post-hoc explanation distortion. We propose Self-Supervised Sufficient Subset Training (SST), a novel paradigm that embeds sufficient subset discovery directly into end-to-end model training. SST employs discrete subset parameterization and self-supervised optimization, enabling the network to natively output concise and faithful sufficient explanations at inference time. Compared to dominant post-hoc methods, SST eliminates iterative search, significantly improving explanation efficiency; enhances explanation faithfulness; and preserves original predictive accuracy. Its core innovation lies in the first end-to-end learnable and native generation of sufficient reasons—bypassing dependence on black-box predictors and mitigating distortions induced by out-of-distribution perturbations.

Technology Category

Application Category

📝 Abstract

Minimal sufficient reasons represent a prevalent form of explanation - the smallest subset of input features which, when held constant at their corresponding values, ensure that the prediction remains unchanged. Previous post-hoc methods attempt to obtain such explanations but face two main limitations: (1) Obtaining these subsets poses a computational challenge, leading most scalable methods to converge towards suboptimal, less meaningful subsets; (2) These methods heavily rely on sampling out-of-distribution input assignments, potentially resulting in counterintuitive behaviors. To tackle these limitations, we propose in this work a self-supervised training approach, which we term *sufficient subset training* (SST). Using SST, we train models to generate concise sufficient reasons for their predictions as an integral part of their output. Our results indicate that our framework produces succinct and faithful subsets substantially more efficiently than competing post-hoc methods, while maintaining comparable predictive performance.

Problem

Research questions and friction points this paper is trying to address.

Generates concise sufficient reasons efficiently

Addresses computational challenges in explanations

Improves interpretability of neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised training approach

Generates concise sufficient reasons

Efficient than post-hoc methods

🔎 Similar Papers

FaithLM: Towards Faithful Explanations for Large Language Models