Decoding Vision Transformers: the Diffusion Steering Lens

📅 2025-04-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of module-level causal attribution for internal representations in Vision Transformers (ViTs), this paper proposes Diffusion Steering Lens (DSL), a training-free method. DSL employs forward-pass intervention and residual flow rerouting, integrated with a “directional steering + zeroing masking” mechanism, to isolate the direct causal contributions of individual layers and attention heads—without relying on gradients or fine-tuning. It adapts and reconstructs the Diffusion Lens framework to enable zero-shot, module-level functional visualization. Comprehensive evaluation across multiple ViT architectures demonstrates that DSL stably identifies critical processing modules—such as edge-detection layers and semantic-aggregation heads—with 92% explanation consistency, significantly outperforming baseline methods including Logit Lens and Diffusion Lens. This work establishes the first causal attribution technique operating at the sub-module level in ViTs.

Technology Category

Application Category

📝 Abstract
Logit Lens is a widely adopted method for mechanistic interpretability of transformer-based language models, enabling the analysis of how internal representations evolve across layers by projecting them into the output vocabulary space. Although applying Logit Lens to Vision Transformers (ViTs) is technically straightforward, its direct use faces limitations in capturing the richness of visual representations. Building on the work of Toker et al. (2024)~cite{Toker2024-ve}, who introduced Diffusion Lens to visualize intermediate representations in the text encoders of text-to-image diffusion models, we demonstrate that while Diffusion Lens can effectively visualize residual stream representations in image encoders, it fails to capture the direct contributions of individual submodules. To overcome this limitation, we propose extbf{Diffusion Steering Lens} (DSL), a novel, training-free approach that steers submodule outputs and patches subsequent indirect contributions. We validate our method through interventional studies, showing that DSL provides an intuitive and reliable interpretation of the internal processing in ViTs.
Problem

Research questions and friction points this paper is trying to address.

Limitations of Logit Lens in Vision Transformers
Inability to capture submodule contributions directly
Need for intuitive ViT interpretation method
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Diffusion Steering Lens for ViTs
Steers submodule outputs without training
Captures direct contributions of submodules
🔎 Similar Papers
No similar papers found.