AFM-Net: Advanced Fusing Hierarchical CNN Visual Priors with Global Sequence Modeling for Remote Sensing Image Scene Classification

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Remote sensing image scene classification faces challenges arising from complex spatial structures and multi-scale variations of ground objects. To address these, this paper proposes AFM-Net—a novel architecture that deeply integrates the multi-scale local prior modeling capability of convolutional neural networks (CNNs) with the efficient global sequential modeling capacity of Mamba. A hierarchical dynamic fusion mechanism is designed to enable cross-level feature interaction and contextual reconstruction. Furthermore, a Mixture-of-Experts (MoE) classification module is introduced to adaptively route features and enhance fine-grained discriminability. Extensive experiments demonstrate state-of-the-art performance: AFM-Net achieves 93.72%, 95.54%, and 96.92% classification accuracy on the AID, NWPU-RESISC45, and UC Merced benchmarks, respectively—outperforming existing methods while achieving a superior balance between accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract

Remote sensing image scene classification remains a challenging task, primarily due to the complex spatial structures and multi-scale characteristics of ground objects. Existing approaches see CNNs excel at modeling local textures, while Transformers excel at capturing global context. However, efficiently integrating them remains a bottleneck due to the high computational cost of Transformers. To tackle this, we propose AFM-Net, a novel Advanced Hierarchical Fusing framework that achieves effective local and global co-representation through two pathways: a CNN branch for extracting hierarchical visual priors, and a Mamba branch for efficient global sequence modeling. The core innovation of AFM-Net lies in its Hierarchical Fusion Mechanism, which progressively aggregates multi-scale features from both pathways, enabling dynamic cross-level feature interaction and contextual reconstruction to produce highly discriminative representations. These fused features are then adaptively routed through a Mixture-of-Experts classifier module, which dispatches them to the most suitable experts for fine-grained scene recognition. Experiments on AID, NWPU-RESISC45, and UC Merced show that AFM-Net obtains 93.72, 95.54, and 96.92 percent accuracy, surpassing state-of-the-art methods with balanced performance and efficiency. Code is available at https://github.com/tangyuanhao-qhu/AFM-Net.

Problem

Research questions and friction points this paper is trying to address.

Integrating CNN local features with Transformer global context efficiently

Addressing multi-scale object complexity in remote sensing classification

Overcoming computational bottlenecks in global sequence modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Fusion Mechanism integrates CNN and Mamba pathways

Mixture-of-Experts classifier adaptively routes features for recognition

AFM-Net combines hierarchical visual priors with global sequence modeling

🔎 Similar Papers

No similar papers found.

Authors to Follow