Spectral-Enhanced Transformers: Leveraging Large-Scale Pretrained Models for Hyperspectral Object Tracking

📅 2024-12-09

🏛️ Workshop on Hyperspectral Image and Signal Processing

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address data scarcity, high training costs, and challenges in joint spatial-spectral modeling under few-shot conditions in hyperspectral target tracking, this paper proposes a lightweight and efficient Transformer adaptation framework tailored for snapshot hyperspectral tracking. Methodologically, it introduces: (1) a learnable spatial-spectral token fusion module that explicitly models cross-dimensional feature interactions; (2) cross-modal knowledge distillation from large-scale pretrained Vision Transformers (ViTs), coupled with adaptive token alignment, enabling zero-shot transfer; and (3) a plug-and-play fine-tuning strategy requiring only minimal labeled data and iterations for any Transformer backbone. Under scarce annotation settings, the framework significantly improves tracking accuracy and spectral discriminability robustness, achieving state-of-the-art performance on benchmark hyperspectral tracking datasets.

Technology Category

Application Category

📝 Abstract

Hyperspectral object tracking using snapshot mosaic cameras is emerging as it provides enhanced spectral information alongside spatial data, contributing to a more comprehensive understanding of material properties. Using transformers, which have consistently outperformed convolutional neural networks (CNNs) in learning better feature representations, would be expected to be effective for Hyperspectral object tracking. However, training large transformers necessitates extensive datasets and prolonged training periods. This is particularly critical for complex tasks like object tracking, and the scarcity of large datasets in the hyperspectral domain acts as a bottleneck in achieving the full potential of powerful transformer models. This paper proposes an effective methodology that adapts large pretrained transformer-based foundation models for hyperspectral object tracking. We propose an adaptive, learnable spatial-spectral token fusion module that can be extended to any transformer-based backbone for learning inherent spatial-spectral features in hyperspectral data. Furthermore, our model incorporates a cross-modality training pipeline that facilitates effective learning across hyperspectral datasets collected with different sensor modalities. This enables the extraction of complementary knowledge from additional modalities, whether or not they are present during testing. Our proposed model also achieves superior performance with minimal training iterations.

Problem

Research questions and friction points this paper is trying to address.

Adapts pretrained transformers for hyperspectral tracking

Develops spatial-spectral token fusion module for transformers

Enables cross-modality learning in hyperspectral datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts large pretrained transformer models

Introduces spatial-spectral token fusion module

Employs cross-modality training pipeline

🔎 Similar Papers

SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network