Efficient Spike-driven Transformer for High-performance Drone-View Geo-Localization

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual bottlenecks in drone-view geolocalization (DVGL)—high energy consumption of conventional artificial neural networks (ANNs) and information loss with poor long-range dependency modeling in spiking neural networks (SNNs)—this paper proposes SpikeViMFormer, the first low-power SNN framework tailored for DVGL. We introduce two novel modules: spike-driven selective attention (SSA), which enhances critical feature preservation, and a spiking hybrid state-space (SHS) module, enabling robust long-range spatiotemporal modeling. Furthermore, we propose hierarchical re-ranking alignment learning (HRAL), a lightweight, end-to-end optimization strategy that leverages only backbone inference for representation learning. Extensive experiments demonstrate that SpikeViMFormer achieves localization accuracy comparable to state-of-the-art ANNs while significantly reducing computational energy consumption. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Traditional drone-view geo-localization (DVGL) methods based on artificial neural networks (ANNs) have achieved remarkable performance. However, ANNs rely on dense computation, which results in high power consumption. In contrast, spiking neural networks (SNNs), which benefit from spike-driven computation, inherently provide low power consumption. Regrettably, the potential of SNNs for DVGL has yet to be thoroughly investigated. Meanwhile, the inherent sparsity of spike-driven computation for representation learning scenarios also results in loss of critical information and difficulties in learning long-range dependencies when aligning heterogeneous visual data sources. To address these, we propose SpikeViMFormer, the first SNN framework designed for DVGL. In this framework, a lightweight spike-driven transformer backbone is adopted to extract coarse-grained features. To mitigate the loss of critical information, the spike-driven selective attention (SSA) block is designed, which uses a spike-driven gating mechanism to achieve selective feature enhancement and highlight discriminative regions. Furthermore, a spike-driven hybrid state space (SHS) block is introduced to learn long-range dependencies using a hybrid state space. Moreover, only the backbone is utilized during the inference stage to reduce computational cost. To ensure backbone effectiveness, a novel hierarchical re-ranking alignment learning (HRAL) strategy is proposed. It refines features via neighborhood re-ranking and maintains cross-batch consistency to directly optimize the backbone. Experimental results demonstrate that SpikeViMFormer outperforms state-of-the-art SNNs. Compared with advanced ANNs, it also achieves competitive performance.Our code is available at https://github.com/ISChenawei/SpikeViMFormer
Problem

Research questions and friction points this paper is trying to address.

Proposes a low-power spiking neural network for drone geo-localization.
Addresses information loss in spike-driven computation via selective attention.
Enhances long-range dependency learning in heterogeneous visual data alignment.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spike-driven transformer backbone extracts coarse-grained features
Spike-driven selective attention block enhances discriminative regions
Spike-driven hybrid state space block learns long-range dependencies
🔎 Similar Papers
No similar papers found.
Z
Zhongwei Chen
State Key Laboratory for Strength and Vibration of Mechanical Structures, Shaanxi Key Laboratory of Environment and Control for Flight Vehicle, School of Aerospace Engineering, Xi’an Jiaotong University, Xi’an 710049, PR China
H
Hai-Jun Rong
State Key Laboratory for Strength and Vibration of Mechanical Structures, Shaanxi Key Laboratory of Environment and Control for Flight Vehicle, School of Aerospace Engineering, Xi’an Jiaotong University, Xi’an 710049, PR China
Z
Zhao-Xu Yang
State Key Laboratory for Strength and Vibration of Mechanical Structures, Shaanxi Key Laboratory of Environment and Control for Flight Vehicle, School of Aerospace Engineering, Xi’an Jiaotong University, Xi’an 710049, PR China
Guoqi Li
Guoqi Li
Professor, Institue of Automation,Chinese Academy of Sciences,Previously Tsinghua University
Brain inspired computingSpiking neural networksBrain inspired large modelsNeuroAI