Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mechanistic interpretability of vision and video models is hindered by a lack of standardized tools and inaccessible model weights. To address this, we introduce the first open-source interpretability toolkit supporting over 75 vision and video Transformers uniformly. It integrates sparse autoencoders (SAEs), transcoder/crosscoder training, activation caching, circuit discovery, and multimodal visualization. Our empirical analysis reveals—novelly—that SAEs in Vision Transformers achieve significantly lower sparsity than those in language models, and counterintuitively, certain reconstructions reduce model loss. We release 80+ pre-trained SAE weights alongside systematic tutorials. The toolkit substantially lowers barriers to entry, enabling high-fidelity mechanistic analysis and full reproducibility of results. By unifying infrastructure, evaluation protocols, and educational resources, it establishes a foundation for standardizing interpretability research in vision models.

Technology Category

Application Category

📝 Abstract
Robust tooling and publicly available pre-trained models have helped drive recent advances in mechanistic interpretability for language models. However, similar progress in vision mechanistic interpretability has been hindered by the lack of accessible frameworks and pre-trained weights. We present Prisma (Access the codebase here: https://github.com/Prisma-Multimodal/ViT-Prisma), an open-source framework designed to accelerate vision mechanistic interpretability research, providing a unified toolkit for accessing 75+ vision and video transformers; support for sparse autoencoder (SAE), transcoder, and crosscoder training; a suite of 80+ pre-trained SAE weights; activation caching, circuit analysis tools, and visualization tools; and educational resources. Our analysis reveals surprising findings, including that effective vision SAEs can exhibit substantially lower sparsity patterns than language SAEs, and that in some instances, SAE reconstructions can decrease model loss. Prisma enables new research directions for understanding vision model internals while lowering barriers to entry in this emerging field.
Problem

Research questions and friction points this paper is trying to address.

Lack of accessible frameworks for vision mechanistic interpretability
Need for unified toolkit supporting multiple vision transformers and tools
Understanding vision model internals with lower research barriers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source framework for vision interpretability research
Supports sparse autoencoder and transcoder training
Includes pre-trained weights and analysis tools
🔎 Similar Papers
No similar papers found.