A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

To address the challenge of extracting structural and detailed features from heterogeneous, redundant data in multimodal remote sensing imagery, this paper proposes the Spatial-Spectral-Frequency Interaction Network (S²Fin). S²Fin innovatively incorporates frequency-domain learning via a high-frequency sparse-enhancement Transformer and a two-stage spatial-frequency fusion strategy, while leveraging phase similarity modeling to strengthen edge responses. It further introduces sparse spatial-spectral attention, an adaptive frequency-channel module, and a high-frequency resonance mask to enable collaborative optimization across spatial, spectral, and frequency domains. Evaluated on four low-label multimodal remote sensing datasets, S²Fin consistently outperforms state-of-the-art methods, demonstrating the effectiveness of cross-domain joint modeling for fine-grained discriminative feature extraction.

Technology Category

Application Category

📝 Abstract

Deep learning-based methods have achieved significant success in remote sensing Earth observation data analysis. Numerous feature fusion techniques address multimodal remote sensing image classification by integrating global and local features. However, these techniques often struggle to extract structural and detail features from heterogeneous and redundant multimodal images. With the goal of introducing frequency domain learning to model key and sparse detail features, this paper introduces the spatial-spectral-frequency interaction network (S$^2$Fin), which integrates pairwise fusion modules across the spatial, spectral, and frequency domains. Specifically, we propose a high-frequency sparse enhancement transformer that employs sparse spatial-spectral attention to optimize the parameters of the high-frequency filter. Subsequently, a two-level spatial-frequency fusion strategy is introduced, comprising an adaptive frequency channel module that fuses low-frequency structures with enhanced high-frequency details, and a high-frequency resonance mask that emphasizes sharp edges via phase similarity. In addition, a spatial-spectral attention fusion module further enhances feature extraction at intermediate layers of the network. Experiments on four benchmark multimodal datasets with limited labeled data demonstrate that S$^2$Fin performs superior classification, outperforming state-of-the-art methods. The code is available at https://github.com/HaoLiu-XDU/SSFin.

Problem

Research questions and friction points this paper is trying to address.

Extracting structural and detail features from heterogeneous multimodal images

Addressing feature redundancy in multimodal remote sensing classification

Enhancing sparse detail features through frequency domain learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates spatial-spectral-frequency pairwise fusion modules

Uses high-frequency sparse enhancement transformer with attention

Applies two-level spatial-frequency fusion with adaptive channels

🔎 Similar Papers

No similar papers found.