Attention, Please! Revisiting Attentive Probing for Masked Image Modeling

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

In masked image modeling (MIM) representation evaluation, linear probing (LP) fails due to the distributed nature of patch tokens. To address this, we propose Efficient Probing (EP), a lightweight attention-based probing method. EP introduces a novel redundancy-free multi-query cross-attention mechanism that selectively aggregates patch features—preserving interpretability while drastically reducing parameter count and computational overhead. Unlike existing attentive probing approaches, EP achieves superior few-shot generalization and robust cross-pretraining-paradigm transfer. Extensive experiments across seven benchmarks demonstrate that EP consistently outperforms both linear probing and state-of-the-art attentive probing methods, achieving up to 10× faster inference. Moreover, EP delivers significant gains in low-shot and hierarchical evaluation settings, establishing new performance frontiers for MIM representation assessment.

Technology Category

Application Category

📝 Abstract

As fine-tuning (FT) becomes increasingly impractical at scale, probing is emerging as the preferred evaluation protocol for self-supervised learning (SSL). Yet, the standard linear probing (LP) fails to adequately reflect the potential of models trained with Masked Image Modeling (MIM), due to the distributed nature of patch tokens. This motivates the need for attentive probing, an alternative that uses attention to selectively aggregate patch-level features. Despite its growing adoption, attentive probing remains under-explored, with existing methods suffering from excessive parameterization and poor computational efficiency. In this work, we revisit attentive probing through the lens of the accuracy-efficiency trade-off. We conduct a systematic study of existing methods, analyzing their mechanisms and benchmarking their performance. We introduce efficient probing (EP), a multi-query cross-attention mechanism that eliminates redundant projections, reduces the number of trainable parameters, and achieves up to a 10$ imes$ speed-up over conventional multi-head attention. Despite its simplicity, EP outperforms LP and prior attentive probing approaches across seven benchmarks, generalizes well beyond MIM to diverse pre-training paradigms, produces interpretable attention maps, and achieves strong gains in low-shot and layer-wise settings. Code available at https://github.com/billpsomas/efficient-probing.

Problem

Research questions and friction points this paper is trying to address.

Improving probing for self-supervised learning evaluation

Reducing parameterization in attentive probing methods

Enhancing computational efficiency in feature aggregation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient probing with multi-query cross-attention

Eliminates redundant projections for speed-up

Outperforms linear and prior attentive probing

🔎 Similar Papers

No similar papers found.