Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses regional spurious correlations in vision transformers (ViTs) induced by dataset bias. We propose a token-dropping-based attribution method—first applied to localize non-robust local features upon which models spuriously rely. Through large-scale experiments on ImageNet, we systematically characterize how training paradigms (supervised vs. self-supervised) affect susceptibility to such spurious correlations and identify multiple classes exhibiting significant region-level spurious signals, accompanied by a verifiable, problem-image–annotated list. We further validate the method’s effectiveness and generalizability on a clinical breast lesion classification task. Our approach provides a novel diagnostic and mitigation tool for spurious correlations in ViTs, advancing the development of more trustworthy, robust, and interpretable vision models.

Technology Category

Application Category

📝 Abstract

Due to their powerful feature association capabilities, neural network-based computer vision models have the ability to detect and exploit unintended patterns within the data, potentially leading to correct predictions based on incorrect or unintended but statistically relevant signals. These clues may vary from simple color aberrations to small texts within the image. In situations where these unintended signals align with the predictive task, models can mistakenly link these features with the task and rely on them for making predictions. This phenomenon is referred to as spurious correlations, where patterns appear to be associated with the task but are actually coincidental. As a result, detection and mitigation of spurious correlations have become crucial tasks for building trustworthy, reliable, and generalizable machine learning models. In this work, we present a novel method to detect spurious correlations in vision transformers, a type of neural network architecture that gained significant popularity in recent years. Using both supervised and self-supervised trained models, we present large-scale experiments on the ImageNet dataset demonstrating the ability of the proposed method to identify spurious correlations. We also find that, even if the same architecture is used, the training methodology has a significant impact on the model's reliance on spurious correlations. Furthermore, we show that certain classes in the ImageNet dataset contain spurious signals that are easily detected by the models and discuss the underlying reasons for those spurious signals. In light of our findings, we provide an exhaustive list of the aforementioned images and call for caution in their use in future research efforts. Lastly, we present a case study investigating spurious signals in invasive breast mass classification, grounding our work in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Detecting spurious correlations in vision transformers

Identifying unintended patterns causing incorrect predictions

Analyzing training methodology impact on spurious reliance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token discarding method for spurious correlation detection

Vision transformers analysis with supervised and self-supervised training

Large-scale ImageNet experiments identifying class-specific spurious signals

🔎 Similar Papers

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification