🤖 AI Summary
Poor interpretability of CNNs in SAR imagery hinders their deployment in high-reliability applications. To address this, we propose a multi-weight self-matching visualization method that jointly models channel-wise and element-wise weights to adaptively fuse feature maps with gradient responses, enabling precise localization of decision-relevant regions. Innovatively, we introduce a dual-level weighted matching mechanism that supports weakly supervised target localization and causal factor analysis—without requiring pixel-level annotations. While grounded in the class activation mapping (CAM) paradigm, our explainability framework overcomes CAM’s limitation of single-level weighting by incorporating hierarchical weight learning. Evaluated on a custom SAR target classification dataset, the method significantly improves both localization accuracy of critical regions and explanation consistency across samples. It thus provides a trustworthy, model-agnostic analytical tool for intelligent SAR image interpretation.
📝 Abstract
In recent years, convolutional neural networks (CNNs) have achieved significant success in various synthetic aperture radar (SAR) tasks. However, the complexity and opacity of their internal mechanisms hinder the fulfillment of high-reliability requirements, thereby limiting their application in SAR. Improving the interpretability of CNNs is thus of great importance for their development and deployment in SAR. In this paper, a visual explanation method termed multi-weight self-matching class activation mapping (MS-CAM) is proposed. MS-CAM matches SAR images with the feature maps and corresponding gradients extracted by the CNN, and combines both channel-wise and element-wise weights to visualize the decision basis learned by the model in SAR images. Extensive experiments conducted on a self-constructed SAR target classification dataset demonstrate that MS-CAM more accurately highlights the network's regions of interest and captures detailed target feature information, thereby enhancing network interpretability. Furthermore, the feasibility of applying MS-CAM to weakly-supervised obiect localization is validated. Key factors affecting localization accuracy, such as pixel thresholds, are analyzed in depth to inform future work.