A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification

📅 2026-04-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
This work addresses the challenges of hyperspectral image classification, particularly the difficulty in effectively fusing spatial-spectral information and the loss of critical features across network layers. To overcome these issues, the authors propose a collaborative CNN-Transformer architecture that employs a dual-branch design: one branch leverages 3D/2D convolutions to capture spatial details, while the other utilizes a Vision Transformer to model spectral dependencies. A hybrid pooling attention mechanism is introduced to enhance feature discriminability, and a cascaded Transformer encoder enables global contextual modeling. Furthermore, a cross-layer feature fusion strategy is adopted to mitigate information degradation during deep feature propagation. Extensive experiments on multiple benchmark hyperspectral datasets demonstrate that the proposed method significantly outperforms current state-of-the-art approaches, confirming its effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract
In the hyperspectral image (HSI) classification task, each pixel is categorized into a specific land-cover category or material. Convolutional neural networks (CNNs) and transformers have been widely used to extract local and non-local features in HSI classification. Recent works have utilized a multi-scale vision transformer (ViT) to enhance spectral feature capture and yield promising results. However, most existing methods still face challenges in the effective joint use of spatial-spectral information and in preserving information across layers during the propagation process. To address these issues, we propose a synergistic CNN-Transformer network with pooling attention fusion for HSI classification, which collaboratively utilizes CNNs and ViT to process spatial and spectral features separately. Specifically, we propose a Twin-Branch Feature Extraction (TBFE) module, which employs 3D and 2D convolution in parallel to comprehensively extract spectral and spatial features from HSI. A hybrid pooling attention (HPA) module is designed to aggregate spatial attention. Moreover, a cascade transformer encoder is employed for global spectral feature extraction, and a simple yet efficient cross-layer feature fusion (CFF) module is designed to reduce the loss of crucial information in the previous network layers. Extensive experiments are conducted on several representative datasets to demonstrate the superior performance of our proposed method compared to the state-of-the-art works. Code is available at https://github.com/chenpeng052/SCT-Net.git.
Problem

Research questions and friction points this paper is trying to address.

hyperspectral image classification
spatial-spectral information
feature preservation
CNN-Transformer synergy
Innovation

Methods, ideas, or system contributions that make the work stand out.

CNN-Transformer synergy
pooling attention fusion
twin-branch feature extraction
hyperspectral image classification
cross-layer feature fusion
🔎 Similar Papers
No similar papers found.
Peng Chen
Peng Chen
Ph.D. student, East China Normal University
Time Series Forecasting,LLM, Foundation Models
W
Wenxuan He
College of Engineering, Shantou University, Shantou, 515063, Guangdong, China; School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
F
Feng Qian
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, 130033, Jilin, China
Guangyao Shi
Guangyao Shi
University of Southern California
multi-robot coordinationtask allocation/schedulingroute planning
Jingwen Yan
Jingwen Yan
Indiana University Purdue University Indianapolis
Machine learningNetwork ScienceAlzheimer's disease