🤖 AI Summary
Traditional CNNs struggle to model fine-grained and complex discriminative features in medical image analysis. To address this, we propose a systematic attention module integration framework that unifies Squeeze-and-Excitation and hybrid convolutional attention mechanisms into five mainstream architectures—VGG16, ResNet18, InceptionV3, DenseNet121, and EfficientNetB5—enabling adaptive feature recalibration along both channel and spatial dimensions. The method is validated across modalities—brain tumor MRI and histopathological images—demonstrating significant improvements in classification accuracy and localization interpretability; EfficientNetB5 augmented with hybrid attention achieves state-of-the-art performance. Our core contributions are: (1) establishing a reproducible, plug-and-play attention evaluation paradigm; and (2) empirically validating the generalizable performance gains and enhanced mechanistic interpretability of attention mechanisms across diverse architectures and medical imaging modalities.
📝 Abstract
Deep learning has become a powerful tool for medical image analysis; however, conventional Convolutional Neural Networks (CNNs) often fail to capture the fine-grained and complex features critical for accurate diagnosis. To address this limitation, we systematically integrate attention mechanisms into five widely adopted CNN architectures, namely, VGG16, ResNet18, InceptionV3, DenseNet121, and EfficientNetB5, to enhance their ability to focus on salient regions and improve discriminative performance. Specifically, each baseline model is augmented with either a Squeeze and Excitation block or a hybrid Convolutional Block Attention Module, allowing adaptive recalibration of channel and spatial feature representations. The proposed models are evaluated on two distinct medical imaging datasets, a brain tumor MRI dataset comprising multiple tumor subtypes, and a Products of Conception histopathological dataset containing four tissue categories. Experimental results demonstrate that attention augmented CNNs consistently outperform baseline architectures across all metrics. In particular, EfficientNetB5 with hybrid attention achieves the highest overall performance, delivering substantial gains on both datasets. Beyond improved classification accuracy, attention mechanisms enhance feature localization, leading to better generalization across heterogeneous imaging modalities. This work contributes a systematic comparative framework for embedding attention modules in diverse CNN architectures and rigorously assesses their impact across multiple medical imaging tasks. The findings provide practical insights for the development of robust, interpretable, and clinically applicable deep learning based decision support systems.