🤖 AI Summary
This work addresses the impractical computational cost of conventional KernelSHAP in 3D medical image segmentation, stemming from expensive coalition evaluations and sliding-window inference. Focusing on whole-body CT segmentation, the authors propose an efficient KernelSHAP framework that restricts explanations to regions of interest and their receptive fields, incorporates a patch logit caching mechanism, and leverages organ-aware supervoxels—including face-centered cubic (FCC) and hybrid strategies—alongside multiple value functions (True Positive, Dice, and Soft Dice). While preserving nnU-Net’s ensemble strategy, this approach substantially improves explanation efficiency. Experiments demonstrate a 15%–30% reduction in computational overhead for whole-body CT segmentation. Organ-aware units more effectively uncover false-positive causes under normalized metrics, whereas regular supervoxels, despite improving perturbation-based scores, lack anatomical consistency.
📝 Abstract
Perturbation-based explainability methods such as KernelSHAP provide model-agnostic attributions but are typically impractical for patch-based 3D medical image segmentation due to the large number of coalition evaluations and the high cost of sliding-window inference. We present an efficient KernelSHAP framework for volumetric CT segmentation that restricts computation to a user-defined region of interest and its receptive-field support, and accelerates inference via patch logit caching, reusing baseline predictions for unaffected patches while preserving nnU-Net's fusion scheme. To enable clinically meaningful attributions, we compare three automatically generated feature abstractions within the receptive-field crop: whole-organ units, regular FCC supervoxels, and hybrid organ-aware supervoxels, and we study multiple aggregation/value functions targeting stabilizing evidence (TP/Dice/Soft Dice) or false-positive behavior. Experiments on whole-body CT segmentations show that caching substantially reduces redundant computation (with computational savings ranging from 15% to 30%) and that faithfulness and interpretability exhibit clear trade-offs: regular supervoxels often maximize perturbation-based metrics but lack anatomical alignment, whereas organ-aware units yield more clinically interpretable explanations and are particularly effective for highlighting false-positive drivers under normalized metrics.