A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of extracting structural and detailed features from heterogeneous, redundant data in multimodal remote sensing imagery, this paper proposes the Spatial-Spectral-Frequency Interaction Network (S²Fin). S²Fin innovatively incorporates frequency-domain learning via a high-frequency sparse-enhancement Transformer and a two-stage spatial-frequency fusion strategy, while leveraging phase similarity modeling to strengthen edge responses. It further introduces sparse spatial-spectral attention, an adaptive frequency-channel module, and a high-frequency resonance mask to enable collaborative optimization across spatial, spectral, and frequency domains. Evaluated on four low-label multimodal remote sensing datasets, S²Fin consistently outperforms state-of-the-art methods, demonstrating the effectiveness of cross-domain joint modeling for fine-grained discriminative feature extraction.

Technology Category

Application Category

📝 Abstract
Deep learning-based methods have achieved significant success in remote sensing Earth observation data analysis. Numerous feature fusion techniques address multimodal remote sensing image classification by integrating global and local features. However, these techniques often struggle to extract structural and detail features from heterogeneous and redundant multimodal images. With the goal of introducing frequency domain learning to model key and sparse detail features, this paper introduces the spatial-spectral-frequency interaction network (S$^2$Fin), which integrates pairwise fusion modules across the spatial, spectral, and frequency domains. Specifically, we propose a high-frequency sparse enhancement transformer that employs sparse spatial-spectral attention to optimize the parameters of the high-frequency filter. Subsequently, a two-level spatial-frequency fusion strategy is introduced, comprising an adaptive frequency channel module that fuses low-frequency structures with enhanced high-frequency details, and a high-frequency resonance mask that emphasizes sharp edges via phase similarity. In addition, a spatial-spectral attention fusion module further enhances feature extraction at intermediate layers of the network. Experiments on four benchmark multimodal datasets with limited labeled data demonstrate that S$^2$Fin performs superior classification, outperforming state-of-the-art methods. The code is available at https://github.com/HaoLiu-XDU/SSFin.
Problem

Research questions and friction points this paper is trying to address.

Extracting structural and detail features from heterogeneous multimodal images
Addressing feature redundancy in multimodal remote sensing classification
Enhancing sparse detail features through frequency domain learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates spatial-spectral-frequency pairwise fusion modules
Uses high-frequency sparse enhancement transformer with attention
Applies two-level spatial-frequency fusion with adaptive channels
🔎 Similar Papers
No similar papers found.
H
Hao Liu
Department of Information Engineering and Computer Science, University of Trento, 38123 Trento, Italy
Y
Yunhao Gao
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China; Beijing Key Laboratory of Fractional Signals and Systems, Beijing Institute of Technology, Beijing 100081, China
W
Wei Li
School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China; Beijing Key Laboratory of Fractional Signals and Systems, Beijing Institute of Technology, Beijing 100081, China
Mingyang Zhang
Mingyang Zhang
School of electronic engineering, Xidian University
Computational IntelligenceRemote SensingImage Processing
M
Maoguo Gong
School of Electronic Engineering, Xidian University, Xi’an 710071, China; Key Laboratory of Collaborative Intelligent Systems of Ministry of Education, Xidian University, Xi’an 710071, China; Academy of Artificial Intelligence, Inner Mongolia Normal University, Hohhot 010028, China
Lorenzo Bruzzone
Lorenzo Bruzzone
Professor of Telecommunications, University of Trento
Remote SensingSynthetic Aperture RadarRadarImage ProcessingPattern Recognition