🤖 AI Summary
To address the challenging sub-pixel, multi-class point target detection problem in hyperspectral imagery—where conventional per-pixel binary classification paradigms suffer from low accuracy and poor generalization—this paper proposes the first end-to-end multi-class point target detection framework. Methodologically, we design a hyperspectral-specific Transformer encoder and introduce a novel self-excited sub-pixel attention mechanism to jointly model spectral-spatial features; detection is formulated as a one-to-many set prediction task, solved by a lightweight DETR-style decoder. Evaluated on our newly established SPOD synthetic benchmark, the method significantly outperforms state-of-the-art hyperspectral and generic object detection models, achieving faster convergence, superior localization precision, and stronger class discrimination. The code and dataset are publicly released.
📝 Abstract
Hyperspectral target detection (HTD) aims to identify specific materials based on spectral information in hyperspectral imagery and can detect extremely small objects, some of which occupy a smaller than one-pixel area. However, existing HTD methods are developed based on per-pixel binary classification, which limits the feature representation capability for instance-level objects. In this paper, we rethink the hyperspectral target detection from the point object detection perspective, and propose the first specialized network for hyperspectral multi-class point object detection, SpecDETR. Without the visual foundation model of the current object detection framework, SpecDETR treats each pixel in input images as a token and uses a multi-layer Transformer encoder with self-excited subpixel-scale attention modules to directly extract joint spatial-spectral features from images. During feature extraction, we introduce a self-excited mechanism to enhance object features through self-excited amplification, thereby accelerating network convergence. Additionally, SpecDETR regards point object detection as a one-to-many set prediction problem, thereby achieving a concise and efficient DETR decoder that surpasses the state-of-the-art (SOTA) DETR decoder. We develop a simulated hyperSpectral Point Object Detection benchmark termed SPOD, and for the first time, evaluate and compare the performance of current object detection networks and HTD methods on hyperspectral point object detection. Extensive experiments demonstrate that our proposed SpecDETR outperforms SOTA object detection networks and HTD methods. Our code and dataset are available at https://github.com/ZhaoxuLi123/SpecDETR.