What is Missing? Explaining Neurons Activated by Absent Concepts

๐Ÿ“… 2026-03-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses a critical limitation in current explainable artificial intelligence (XAI) methods: their inability to capture โ€œabsence codingโ€โ€”a phenomenon wherein neurons are activated specifically due to the absence of certain concepts. We systematically uncover and formally characterize this mechanism for the first time, proposing concise extensions to attribution methods and feature visualization techniques to identify and interpret such absence-driven activations. Through empirical analysis on ImageNet-pretrained models, we demonstrate that absence coding is widespread and that explicitly incorporating it into explanatory frameworks significantly enhances model debiasing performance. Our findings offer a novel perspective on the causal reasoning mechanisms underlying deep neural networks, advancing the interpretability and reliability of modern AI systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Explainable artificial intelligence (XAI) aims to provide human-interpretable insights into the behavior of deep neural networks (DNNs), typically by estimating a simplified causal structure of the model. In existing work, this causal structure often includes relationships where the presence of a concept is associated with a strong activation of a neuron. For example, attribution methods primarily identify input pixels that contribute most to a prediction, and feature visualization methods reveal inputs that cause high activation of a target neuron - the former implicitly assuming that the relevant information resides in the input, and the latter that neurons encode the presence of concepts. However, a largely overlooked type of causal relationship is that of encoded absences, where the absence of a concept increases neural activation. In this work, we show that such missing but relevant concepts are common and that mainstream XAI methods struggle to reveal them when applied in their standard form. To address this, we propose two simple extensions to attribution and feature visualization techniques that uncover encoded absences. Across experiments, we show how mainstream XAI methods can be used to reveal and explain encoded absences, how ImageNet models exploit them, and that debiasing can be improved when considering them.
Problem

Research questions and friction points this paper is trying to address.

explainable AI
absent concepts
neural activation
causal relationships
feature visualization
Innovation

Methods, ideas, or system contributions that make the work stand out.

encoded absences
explainable AI
neuron activation
feature visualization
attribution methods
๐Ÿ”Ž Similar Papers
No similar papers found.