🤖 AI Summary
This work addresses the limited interpretability of feature representations in deep neural networks (DNNs). We propose a unified visualization framework based on activation maximization—first systematically extended to intermediate layers of both convolutional neural networks (CNNs) and vision transformers (ViTs). By integrating gradient-based optimization with multi-scale regularization, our method generates semantically meaningful visualizations that elucidate the hierarchical evolution of layer-wise neuronal representations. Furthermore, we leverage the same framework to synthesize high-fidelity adversarial examples, thereby characterizing model decision boundaries and exposing structural vulnerabilities. Experiments demonstrate strong generalizability and interpretability across CNNs and ViTs, validating the framework’s effectiveness in revealing internal representational semantics and failure modes. This approach establishes a novel paradigm for model diagnosis and robustness analysis.
📝 Abstract
Understanding internal feature representations of deep neural networks (DNNs) is a fundamental step toward model interpretability. Inspired by neuroscience methods that probe biological neurons using visual stimuli, recent deep learning studies have employed Activation Maximization (AM) to synthesize inputs that elicit strong responses from artificial neurons. In this work, we propose a unified feature visualization framework applicable to both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Unlike prior efforts that predominantly focus on the last output-layer neurons in CNNs, we extend feature visualization to intermediate layers as well, offering deeper insights into the hierarchical structure of learned feature representations. Furthermore, we investigate how activation maximization can be leveraged to generate adversarial examples, revealing potential vulnerabilities and decision boundaries of DNNs. Our experiments demonstrate the effectiveness of our approach in both traditional CNNs and modern ViT, highlighting its generalizability and interpretive value.