🤖 AI Summary
This work addresses the poor interpretability of deep neural networks and the limited expressiveness of directly enforced monotonic architectures by proposing a novel framework that decomposes a trained ReLU network via difference-of-convex (DC) decomposition into the difference of two monotonic convex functions. Building upon this decomposition, the authors introduce two new saliency methods—SplitCAM and SplitLRP—that enable intrinsic interpretability without modifying the original model architecture. Evaluated on ImageNet-S using VGG16 and ResNet18 backbones, the proposed approach consistently outperforms state-of-the-art methods across all Quantus evaluation metrics, demonstrating the effectiveness and superiority of DC decomposition–based self-explaining models in providing faithful and accurate explanations.
📝 Abstract
It has been demonstrated in various contexts that monotonicity leads to better explainability in neural networks. However, not every function can be well approximated by a monotone neural network. We demonstrate that monotonicity can still be used in two ways to boost explainability. First, we use an adaptation of the decomposition of a trained ReLU network into two monotone and convex parts, thereby overcoming numerical obstacles from an inherent blowup of the weights in this procedure. Our proposed saliency methods - SplitCAM and SplitLRP - improve on state of the art results on both VGG16 and Resnet18 networks on ImageNet-S across all Quantus saliency metric categories. Second, we exhibit that training a model as the difference between two monotone neural networks results in a system with strong self-explainability properties.