🤖 AI Summary
This work addresses a key limitation in existing sparse autoencoders (SAEs) for interpreting CLIP vision encoders: their predominant focus on feature semantics while overlooking the spatial extent over which visual evidence is aggregated. To bridge this gap, the study introduces “information scope” as a novel dimension of interpretability and proposes the Contextual Dependency Score (CDS) to quantitatively characterize the locality or globality of SAE features. Through systematic spatial perturbation experiments and analysis of CLIP representations, the authors demonstrate how features with differing information scopes differentially influence model predictions and confidence. This approach successfully disentangles SAE features into locally stable versus globally varying components, substantially expanding the analytical framework for SAE interpretability.
📝 Abstract
Sparse Autoencoders (SAEs) have emerged as a powerful tool for interpreting the internal representations of CLIP vision encoders, yet existing analyses largely focus on the semantic meaning of individual features. We introduce information scope as a complementary dimension of interpretability that characterizes how broadly an SAE feature aggregates visual evidence, ranging from localized, patch-specific cues to global, image-level signals. We observe that some SAE features respond consistently across spatial perturbations, while others shift unpredictably with minor input changes, indicating a fundamental distinction in their underlying scope. To quantify this, we propose the Contextual Dependency Score (CDS), which separates positionally stable local scope features from positionally variant global scope features. Our experiments show that features of different information scopes exert systematically different influences on CLIP's predictions and confidence. These findings establish information scope as a critical new axis for understanding CLIP representations and provide a deeper diagnostic view of SAE-derived features.