AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Remote sensing image scene classification faces challenges arising from complex spatial structures and multi-scale variations of ground objects. To address these, this paper proposes AFM-Net—a novel architecture that deeply integrates the multi-scale local prior modeling capability of convolutional neural networks (CNNs) with the efficient global sequential modeling capacity of Mamba. A hierarchical dynamic fusion mechanism is designed to enable cross-level feature interaction and contextual reconstruction. Furthermore, a Mixture-of-Experts (MoE) classification module is introduced to adaptively route features and enhance fine-grained discriminability. Extensive experiments demonstrate state-of-the-art performance: AFM-Net achieves 93.72%, 95.54%, and 96.92% classification accuracy on the AID, NWPU-RESISC45, and UC Merced benchmarks, respectively—outperforming existing methods while achieving a superior balance between accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract
Remote sensing image scene classification remains a challenging task, primarily due to the complex spatial structures and multi-scale characteristics of ground objects. Existing approaches see CNNs excel at modeling local textures, while Transformers excel at capturing global context. However, efficiently integrating them remains a bottleneck due to the high computational cost of Transformers. To tackle this, we propose AFM-Net, a novel Advanced Hierarchical Fusing framework that achieves effective local and global co-representation through two pathways: a CNN branch for extracting hierarchical visual priors, and a Mamba branch for efficient global sequence modeling. The core innovation of AFM-Net lies in its Hierarchical Fusion Mechanism, which progressively aggregates multi-scale features from both pathways, enabling dynamic cross-level feature interaction and contextual reconstruction to produce highly discriminative representations. These fused features are then adaptively routed through a Mixture-of-Experts classifier module, which dispatches them to the most suitable experts for fine-grained scene recognition. Experiments on AID, NWPU-RESISC45, and UC Merced show that AFM-Net obtains 93.72, 95.54, and 96.92 percent accuracy, surpassing state-of-the-art methods with balanced performance and efficiency. Code is available at https://github.com/tangyuanhao-qhu/AFM-Net.
Problem

Research questions and friction points this paper is trying to address.

Integrating CNN local features with Transformer global context efficiently
Addressing multi-scale object complexity in remote sensing classification
Overcoming computational bottlenecks in global sequence modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Fusion Mechanism integrates CNN and Mamba pathways
Mixture-of-Experts classifier adaptively routes features for recognition
AFM-Net combines hierarchical visual priors with global sequence modeling
🔎 Similar Papers
No similar papers found.
Y
Yuanhao Tang
School of Computer Technology and Applications and the Intelligent Computing and Application Laboratory of Qinghai Province, Qinghai University, Xining 810016, China
X
Xuechao Zou
Key Lab of Big Data and Artificial Intelligence in Transportation (Ministry of Education), School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
Z
Zhengpei Hu
School of Computer Technology and Applications and the Intelligent Computing and Application Laboratory of Qinghai Province, Qinghai University, Xining 810016, China
J
Junliang Xing
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China, and also with the Key Laboratory of Pervasive Computing, Ministry of Education, Beijing 100084, China
C
Chengkun Zhang
School of Computer Technology and Applications and the Intelligent Computing and Application Laboratory of Qinghai Province, Qinghai University, Xining 810016, China
Jianqiang Huang
Jianqiang Huang
Nanyang Technological University, Chinese Academy of Sciences
Compter VisionMachine LearningCasuality