Explaining the Impact of Training on Vision Models via Activation Clustering

📅 2024-11-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study investigates how training data distribution, supervision strength, and model architecture (e.g., ViT Registers) jointly shape the semantic structure and information flow of visual representations. To this end, we propose NAVE—a novel unsupervised activation clustering method leveraging frozen encoders—to systematically disentangle the individual and interactive effects of these factors on concept acquisition. Our experiments reveal: (1) supervision level and data distribution critically reconfigure high-level semantic organization; (2) ViT Registers non-redundantly enhance local–global information integration; and (3) implicit dataset biases (Clever Hans effects) induce feature-space saturation and representation degradation. NAVE establishes a quantitative, attribution-aware analytical framework for visual representation formation, enabling concept-level interpretability assessment and architecture-specific diagnostic analysis.

Technology Category

Application Category

📝 Abstract

Recent developments in the field of explainable artificial intelligence (XAI) for vision models investigate the information extracted by their feature encoder. We contribute to this effort and propose Neuro-Activated Vision Explanations (NAVE), which extracts the information captured by the encoder by clustering the feature activations of the frozen network to be explained. The method does not aim to explain the model's prediction but to answer questions such as which parts of the image are processed similarly or which information is kept in deeper layers. Experimentally, we leverage NAVE to show that the training dataset and the level of supervision affect which concepts are captured. In addition, our method reveals the impact of registers on vision transformers (ViT) and the information saturation caused by the watermark Clever Hans effect in the training set.

Problem

Research questions and friction points this paper is trying to address.

Explain vision model training impact

Analyze feature activation clustering

Investigate training dataset influence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clusters feature activations

Evaluates training dataset impact

Measures information saturation effects

🔎 Similar Papers

Training Neural Networks for Modularity aids Interpretability