Learning to Adapt to Position Bias in Vision Transformer Classifiers

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the cross-dataset variability of position bias in Vision Transformers (ViTs). To tackle this, we propose an adaptive positional embedding mechanism. First, we introduce Position-SHAP—a novel extension of SHAP values—to quantitatively measure the contribution bias of each spatial position toward classification decisions. Second, we design Auto-PE, a lightweight, single-parameter learnable module that dynamically suppresses or preserves positional information via adjustable norm scaling. Auto-PE requires no modification to the ViT backbone and is compatible with diverse positional embedding schemes. Extensive experiments on multiple image classification benchmarks demonstrate that Auto-PE maintains near-identical computational overhead while significantly improving or preserving classification accuracy—validating the efficacy of data-driven positional modeling. Our core contributions are threefold: (1) the first quantitative characterization of position bias in ViTs; (2) a parameter-efficient, adaptive positional embedding mechanism; and (3) empirical evidence that positional modeling must align with the intrinsic characteristics of the target data distribution.

Technology Category

Application Category

📝 Abstract
How discriminative position information is for image classification depends on the data. On the one hand, the camera position is arbitrary and objects can appear anywhere in the image, arguing for translation invariance. At the same time, position information is key for exploiting capture/center bias, and scene layout, e.g.: the sky is up. We show that position bias, the level to which a dataset is more easily solved when positional information on input features is used, plays a crucial role in the performance of Vision Transformers image classifiers. To investigate, we propose Position-SHAP, a direct measure of position bias by extending SHAP to work with position embeddings. We show various levels of position bias in different datasets, and find that the optimal choice of position embedding depends on the position bias apparent in the dataset. We therefore propose Auto-PE, a single-parameter position embedding extension, which allows the position embedding to modulate its norm, enabling the unlearning of position information. Auto-PE combines with existing PEs to match or improve accuracy on classification datasets.
Problem

Research questions and friction points this paper is trying to address.

Measuring position bias impact on Vision Transformer classifiers
Determining optimal position embeddings for varying datasets
Developing adaptive position embeddings to improve classification accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Position-SHAP to measure position bias
Introduces Auto-PE for adaptive position embedding
Modulates position embedding norm for flexibility