🤖 AI Summary
Medical deep learning interpretability is hindered by semantic distortion in path attribution caused by fixed baselines (e.g., all-zero inputs) that lack clinical meaning. To address this, we propose a counterfactual-guided adaptive baseline selection framework: “missingness” is redefined using clinically plausible counterfactual samples—normal and semantically proximal to the input—thereby overcoming the semantic limitations of conventional baselines. Our method employs a variational autoencoder to generate individualized counterfactual baselines and integrates them with path-integrated gradients to yield medically grounded attributions. The framework is model-agnostic and technically extensible. Experiments across three medical datasets demonstrate that our approach significantly improves attribution faithfulness and clinical relevance compared to standard baselines, yielding more interpretable and trustworthy feature importance distributions. To our knowledge, this is the first work to systematically reconstruct the notion of “missingness” in path attribution from a clinical semantic perspective.
📝 Abstract
The explainability of deep learning models remains a significant challenge, particularly in the medical domain where interpretable outputs are critical for clinical trust and transparency. Path attribution methods such as Integrated Gradients rely on a baseline input representing the absence of relevant features ("missingness"). Commonly used baselines, such as all-zero inputs, are often semantically meaningless, especially in medical contexts where missingness can itself be informative. While alternative baseline choices have been explored, existing methods lack a principled approach to dynamically select baselines tailored to each input. In this work, we examine the notion of missingness in the medical setting, analyze its implications for baseline selection, and introduce a counterfactual-guided approach to address the limitations of conventional baselines. We argue that a clinically normal but input-close counterfactual represents a more accurate representation of a meaningful absence of features in medical data. To implement this, we use a Variational Autoencoder to generate counterfactual baselines, though our concept is generative-model-agnostic and can be applied with any suitable counterfactual method. We evaluate the approach on three distinct medical data sets and empirically demonstrate that counterfactual baselines yield more faithful and medically relevant attributions compared to standard baseline choices.