Sparse autoencoders reveal selective remapping of visual concepts during adaptation

📅 2024-12-06

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work investigates how prompt-based adaptation alters the semantic understanding mechanisms of foundational vision models—specifically CLIP-ViT—toward input images. To this end, we propose PatchSAE, the first image-patch-level sparse autoencoder designed for intermediate layers of Vision Transformers (ViTs), enabling fine-grained disentanglement and spatial attribution of interpretable visual concepts (e.g., shape, color, semantics). Through systematic analysis of input-to-concept mappings before and after adaptation, we find that performance gains stem primarily from reweighting and selective remapping of the model’s pre-existing concepts—not from generating novel ones. Our approach establishes the first framework for spatially grounded, concept-level disentanglement in ViTs, yielding an interpretable and reproducible analytical methodology for adapter mechanisms in large vision models. This advances fundamental understanding of prompt learning by revealing its operational principles at the conceptual level.

Technology Category

Application Category

📝 Abstract

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Understanding mechanisms in foundation model adaptation

Extracting interpretable visual concepts via Sparse Autoencoder

Explaining adaptation gains using existing model concepts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed PatchSAE for CLIP vision transformer

Extracted interpretable concepts at granular levels

Explained adaptation gains using existing concepts

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers