Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification

πŸ“… 2025-09-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address modality imbalance induced by pre-trained diffusion models and the challenge of leveraging diffusion features to guide complementary, diverse feature extraction in multimodal remote sensing image classification, this paper proposes the Balanced Diffusion-Guided Fusion (BDGF) framework. BDGF introduces an adaptive modality masking strategy to mitigate modality imbalance; designs a diffusion-feature-guided multi-architecture collaborative learning mechanism integrating CNN, Mamba, and Transformer branches, enhanced by entropy alignment and feature similarity constraints to strengthen inter-branch cooperation; and incorporates grouped channel attention and cross-attention for hierarchical feature fusion. Extensive experiments on four benchmark remote sensing datasets demonstrate significant improvements over state-of-the-art methods. The source code is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Deep learning-based techniques for the analysis of multimodal remote sensing data have become popular due to their ability to effectively integrate complementary spatial, spectral, and structural information from different sensors. Recently, denoising diffusion probabilistic models (DDPMs) have attracted attention in the remote sensing community due to their powerful ability to capture robust and complex spatial-spectral distributions. However, pre-training multimodal DDPMs may result in modality imbalance, and effectively leveraging diffusion features to guide complementary diversity feature extraction remains an open question. To address these issues, this paper proposes a balanced diffusion-guided fusion (BDGF) framework that leverages multimodal diffusion features to guide a multi-branch network for land-cover classification. Specifically, we propose an adaptive modality masking strategy to encourage the DDPMs to obtain a modality-balanced rather than spectral image-dominated data distribution. Subsequently, these diffusion features hierarchically guide feature extraction among CNN, Mamba, and transformer networks by integrating feature fusion, group channel attention, and cross-attention mechanisms. Finally, a mutual learning strategy is developed to enhance inter-branch collaboration by aligning the probability entropy and feature similarity of individual subnetworks. Extensive experiments on four multimodal remote sensing datasets demonstrate that the proposed method achieves superior classification performance. The code is available at https://github.com/HaoLiu-XDU/BDGF.
Problem

Research questions and friction points this paper is trying to address.

Addresses modality imbalance in multimodal diffusion models for remote sensing
Guides complementary feature extraction using diffusion features across networks
Enhances inter-branch collaboration through mutual learning strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Balanced diffusion-guided fusion for multimodal classification
Adaptive modality masking strategy for balanced data distribution
Hierarchical feature guidance with CNN, Mamba, and transformer networks
πŸ”Ž Similar Papers
No similar papers found.
H
Hao Liu
Department of Information Engineering and Computer Science, University of Trento, 38123 Trento, Italy
Yongjie Zheng
Yongjie Zheng
Professor, California State University San Marcos
Software EngineeringSoftware ArchitectureSoftware Product LinesSelf-Adaptive Systems
Y
Yuhan Kang
College of Electrical and Information Engineering, Hunan University, Changsha 410082, China
Mingyang Zhang
Mingyang Zhang
School of electronic engineering, Xidian University
Computational IntelligenceRemote SensingImage Processing
M
Maoguo Gong
Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China and the Academy of Artificial Intelligence, College of Mathematics Science, Inner Mongolia Normal University, Hohhot 010028, China
Lorenzo Bruzzone
Lorenzo Bruzzone
Professor of Telecommunications, University of Trento
Remote SensingSynthetic Aperture RadarRadarImage ProcessingPattern Recognition