Dual Selective Fusion Transformer Network for Hyperspectral Image Classification

📅 2024-10-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

To address the limitations of fixed receptive fields in multiscale land-cover representation and redundant feature interference introduced by standard self-attention in hyperspectral image (HSI) classification, this paper proposes a Spatial-Spectral Dual-Path Selective Fusion Transformer. Its core contributions are: (1) a Kernel Selective Fusion Transformer Block that adaptively learns optimal convolutional kernel sizes to dynamically adjust the receptive field; and (2) a Token Selective Fusion Transformer Block that jointly models spatial-spectral token importance for weighted fusion of discriminative features. The model integrates multiscale convolutional perception, learnable receptive field selection, and joint spatial-spectral self-attention. Experiments on PaviaU, Houston, Indian Pines, and WHU-HongHu datasets achieve overall accuracies of 96.59%, 97.66%, 95.17%, and 94.59%, respectively—averaging 2.01% higher than state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Transformer has achieved satisfactory results in the field of hyperspectral image (HSI) classification. However, existing Transformer models face two key challenges when dealing with HSI scenes characterized by diverse land cover types and rich spectral information: (1) A fixed receptive field overlooks the effective contextual scales required by various HSI objects; (2) invalid self-attention features in context fusion affect model performance. To address these limitations, we propose a novel Dual Selective Fusion Transformer Network (DSFormer) for HSI classification. DSFormer achieves joint spatial and spectral contextual modeling by flexibly selecting and fusing features across different receptive fields, effectively reducing unnecessary information interference by focusing on the most relevant spatial-spectral tokens. Specifically, we design a Kernel Selective Fusion Transformer Block (KSFTB) to learn an optimal receptive field by adaptively fusing spatial and spectral features across different scales, enhancing the model's ability to accurately identify diverse HSI objects. Additionally, we introduce a Token Selective Fusion Transformer Block (TSFTB), which strategically selects and combines essential tokens during the spatial-spectral self-attention fusion process to capture the most crucial contexts. Extensive experiments conducted on four benchmark HSI datasets demonstrate that the proposed DSFormer significantly improves land cover classification accuracy, outperforming existing state-of-the-art methods. Specifically, DSFormer achieves overall accuracies of 96.59%, 97.66%, 95.17%, and 94.59% in the Pavia University, Houston, Indian Pines, and Whu-HongHu datasets, respectively, reflecting improvements of 3.19%, 1.14%, 0.91%, and 2.80% over the previous model. The code will be available online at https://github.com/YichuXu/DSFormer.

Problem

Research questions and friction points this paper is trying to address.

Handles diverse land cover types in HSI

Reduces invalid self-attention feature interference

Improves spatial-spectral contextual modeling accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Selective Fusion Transformer Network

Kernel Selective Fusion Transformer Block

Token Selective Fusion Transformer Block

🔎 Similar Papers

No similar papers found.