🤖 AI Summary
This study investigates how training data distribution, supervision strength, and model architecture (e.g., ViT Registers) jointly shape the semantic structure and information flow of visual representations. To this end, we propose NAVE—a novel unsupervised activation clustering method leveraging frozen encoders—to systematically disentangle the individual and interactive effects of these factors on concept acquisition. Our experiments reveal: (1) supervision level and data distribution critically reconfigure high-level semantic organization; (2) ViT Registers non-redundantly enhance local–global information integration; and (3) implicit dataset biases (Clever Hans effects) induce feature-space saturation and representation degradation. NAVE establishes a quantitative, attribution-aware analytical framework for visual representation formation, enabling concept-level interpretability assessment and architecture-specific diagnostic analysis.
📝 Abstract
Recent developments in the field of explainable artificial intelligence (XAI) for vision models investigate the information extracted by their feature encoder. We contribute to this effort and propose Neuro-Activated Vision Explanations (NAVE), which extracts the information captured by the encoder by clustering the feature activations of the frozen network to be explained. The method does not aim to explain the model's prediction but to answer questions such as which parts of the image are processed similarly or which information is kept in deeper layers. Experimentally, we leverage NAVE to show that the training dataset and the level of supervision affect which concepts are captured. In addition, our method reveals the impact of registers on vision transformers (ViT) and the information saturation caused by the watermark Clever Hans effect in the training set.