🤖 AI Summary
This study addresses the challenge of insufficient model robustness in aerial image scene classification, which arises from the structural complexity of scenes and high heterogeneity of ground objects. The paper provides a systematic review of classification approaches, ranging from handcrafted features such as SIFT and LBP to classical convolutional neural networks including VGG and GoogLeNet. To overcome these limitations, the authors propose Aerial-Y-Net, an innovative architecture that integrates multi-scale feature extraction with a spatial attention mechanism to significantly enhance semantic understanding of complex aerial scenes. Evaluated on the AID dataset, Aerial-Y-Net achieves a classification accuracy of 91.72%, outperforming several state-of-the-art baseline models and demonstrating its effectiveness and technical advancement.
📝 Abstract
Aerial images play a vital role in urban planning and environmental preservation, as they consist of various structures, representing different types of buildings, forests, mountains, and unoccupied lands. Due to its heterogeneous nature, developing robust models for scene classification remains a challenge. In this study, we conduct a literature review of various machine learning methods for aerial image classification. Our survey covers a range of approaches from handcrafted features (e.g., SIFT, LBP) to traditional CNNs (e.g., VGG, GoogLeNet), and advanced deep hybrid networks. In this connection, we have also designed Aerial-Y-Net, a spatial attention-enhanced CNN with multi-scale feature fusion mechanism, which acts as an attention-based model and helps us to better understand the complexities of aerial images. Evaluated on the AID dataset, our model achieves 91.72% accuracy, outperforming several baseline architectures.