ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work challenges the prevailing hypothesis that CNNs inherently prefer texture over shape, identifying methodological limitations—including forced-choice confounds—in Geirhos et al.’s cue-conflict experiments. Method: We propose a domain-agnostic framework to quantitatively measure feature dependence by systematically ablating shape, texture, and color cues. Our approach enables unbiased evaluation of feature preferences across diverse tasks: ImageNet classification, medical imaging, and remote sensing. Contribution/Results: Experiments reveal that standard CNNs primarily rely on local shape—not texture—while medical models exhibit strong color bias and remote sensing models show heightened texture dependence. Moreover, modern architectures (e.g., ConvNeXt, ViT) and advanced training strategies significantly attenuate shape bias. Comparative analysis with human perceptual data further validates these findings, uncovering both domain-specificity and plasticity in feature reliance. All code is publicly available.

Technology Category

Application Category

📝 Abstract

The hypothesis that Convolutional Neural Networks (CNNs) are inherently texture-biased has shaped much of the discourse on feature use in deep learning. We revisit this hypothesis by examining limitations in the cue-conflict experiment by Geirhos et al. To address these limitations, we propose a domain-agnostic framework that quantifies feature reliance through systematic suppression of shape, texture, and color cues, avoiding the confounds of forced-choice conflicts. By evaluating humans and neural networks under controlled suppression conditions, we find that CNNs are not inherently texture-biased but predominantly rely on local shape features. Nonetheless, this reliance can be substantially mitigated through modern training strategies or architectures (ConvNeXt, ViTs). We further extend the analysis across computer vision, medical imaging, and remote sensing, revealing that reliance patterns differ systematically: computer vision models prioritize shape, medical imaging models emphasize color, and remote sensing models exhibit a stronger reliance towards texture. Code is available at https://github.com/tomburgert/feature-reliance.

Problem

Research questions and friction points this paper is trying to address.

Revisiting whether CNNs are inherently texture-biased through controlled experiments

Quantifying feature reliance by suppressing shape, texture, and color cues systematically

Analyzing feature reliance patterns across computer vision, medical imaging, and remote sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-agnostic framework for feature reliance

Systematic suppression of shape, texture, color

Evaluates models across vision, medical, remote sensing

🔎 Similar Papers

Explicitly Disentangled Representations in Object-Centric Learning