The Invisible Gorilla Effect in Out-of-distribution Detection

๐Ÿ“… 2026-02-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses a critical yet previously unexplained failure mode in out-of-distribution (OOD) detection: severe performance degradation when confronted with subtle artifacts visually dissimilar to the modelโ€™s region of interest (ROI). The authors identify and name this phenomenon the โ€œInvisible Gorilla Effect,โ€ revealing an implicit reliance of OOD detectors on visual similarity between artifacts and the ROI. To validate this effect, they manually annotate artifact colors in 11,355 images, construct counterfactual samples via color swapping, and systematically evaluate 40 OOD methods across seven benchmarks. Experiments demonstrate that, in tasks such as skin lesion classification, mainstream approaches like the Mahalanobis Score suffer AUROC drops of up to 31.5% when artifact colors diverge from those of the ROI, underscoring a fundamental vulnerability in current OOD detection frameworks.

Technology Category

Application Category

๐Ÿ“ Abstract
Deep Neural Networks achieve high performance in vision tasks by learning features from regions of interest (ROI) within images, but their performance degrades when deployed on out-of-distribution (OOD) data that differs from training data. This challenge has led to OOD detection methods that aim to identify and reject unreliable predictions. Although prior work shows that OOD detection performance varies by artefact type, the underlying causes remain underexplored. To this end, we identify a previously unreported bias in OOD detection: for hard-to-detect artefacts (near-OOD), detection performance typically improves when the artefact shares visual similarity (e.g. colour) with the model's ROI and drops when it does not - a phenomenon we term the Invisible Gorilla Effect. For example, in a skin lesion classifier with red lesion ROI, we show the method Mahalanobis Score achieves a 31.5% higher AUROC when detecting OOD red ink (similar to ROI) compared to black ink (dissimilar) annotations. We annotated artefacts by colour in 11,355 images from three public datasets (e.g. ISIC) and generated colour-swapped counterfactuals to rule out dataset bias. We then evaluated 40 OOD methods across 7 benchmarks and found significant performance drops for most methods when artefacts differed from the ROI. Our findings highlight an overlooked failure mode in OOD detection and provide guidance for more robust detectors. Code and annotations are available at: https://github.com/HarryAnthony/Invisible_Gorilla_Effect.
Problem

Research questions and friction points this paper is trying to address.

Out-of-distribution detection
Invisible Gorilla Effect
Region of interest
Artifacts
Visual similarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Invisible Gorilla Effect
out-of-distribution detection
region of interest (ROI)
visual similarity bias
counterfactual evaluation