🤖 AI Summary
This work identifies and quantifies an implicit background dependency in posterior concept-embedding C-XAI methods for visual DNNs: such methods frequently misattribute background statistical shortcuts as semantic concepts, leading to explanation failure under atypical backgrounds (e.g., animals on roads). To address this, the authors propose the first systematic diagnostic framework—an scalable evaluation paradigm based on background randomization—integrating a Net2Vec variant for concept activation analysis and a multi-concept benchmark spanning 50+ concepts, two datasets, and seven model architectures. Empirical results provide the first large-scale evidence of severe background bias in mainstream C-XAI methods. A key finding is that lightweight background perturbations substantially improve both concept robustness and segmentation generalizability. This discovery establishes a practical, theoretically grounded pathway toward developing background-robust concept-based explanations.
📝 Abstract
The thriving research field of concept-based explainable artificial intelligence (C-XAI) investigates how human-interpretable semantic concepts embed in the latent spaces of deep neural networks (DNNs). Post-hoc approaches therein use a set of examples to specify a concept, and determine its embeddings in DNN latent space using data driven techniques. This proved useful to uncover biases between different target (foreground or concept) classes. However, given that the background is mostly uncontrolled during training, an important question has been left unattended so far: Are/to what extent are state-of-the-art, data-driven post-hoc C-XAI approaches themselves prone to biases with respect to their backgrounds? E.g., wild animals mostly occur against vegetation backgrounds, and they seldom appear on roads. Even simple and robust C-XAI methods might abuse this shortcut for enhanced performance. A dangerous performance degradation of the concept-corner cases of animals on the road could thus remain undiscovered. This work validates and thoroughly confirms that established Net2Vec-based concept segmentation techniques frequently capture background biases, including alarming ones, such as underperformance on road scenes. For the analysis, we compare 3 established techniques from the domain of background randomization on>50 concepts from 2 datasets, and 7 diverse DNN architectures. Our results indicate that even low-cost setups can provide both valuable insight and improved background robustness.