TP-Spikformer: Token Pruned Spiking Transformer

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the high computational and memory overhead of deploying spiking Transformers on resource-constrained devices by proposing a training-free token pruning method. The approach evaluates token importance through a heuristic spatiotemporal information preservation criterion and dynamically eliminates redundant tokens during inference using a block-level early-exit strategy, substantially reducing computational load. As the first effort to introduce a training-agnostic pruning mechanism into spiking Transformers, the method achieves a favorable balance between efficiency and performance while supporting broad applicability across multiple tasks and architectures. Experimental results demonstrate its effectiveness in enabling efficient inference with minimal accuracy degradation across diverse benchmarks, including image classification, object detection, semantic segmentation, and event-based object tracking.

Technology Category

Application Category

📝 Abstract

Spiking neural networks (SNNs) offer an energy-efficient alternative to traditional neural networks due to their event-driven computing paradigm. However, recent advancements in spiking transformers have focused on improving accuracy with large-scale architectures, which require significant computational resources and limit deployment on resource-constrained devices. In this paper, we propose a simple yet effective token pruning method for spiking transformers, termed TP-Spikformer, that reduces storage and computational overhead while maintaining competitive performance. Specifically, we first introduce a heuristic spatiotemporal information-retaining criterion that comprehensively evaluates tokens' importance, assigning higher scores to informative tokens for retention and lower scores to uninformative ones for pruning. Based on this criterion, we propose an information-retaining token pruning framework that employs a block-level early stopping strategy for uninformative tokens, instead of removing them outright. This also helps preserve more information during token pruning. We demonstrate the effectiveness, efficiency and scalability of TP-Spikformer through extensive experiments across diverse architectures, including Spikformer, QKFormer and Spike-driven Transformer V1 and V3, and a range of tasks such as image classification, object detection, semantic segmentation and event-based object tracking. Particularly, TP-Spikformer performs well in a training-free manner. These results reveal its potential as an efficient and practical solution for deploying SNNs in real-world applications with limited computational resources.

Problem

Research questions and friction points this paper is trying to address.

Spiking Neural Networks

Token Pruning

Computational Efficiency

Resource-Constrained Deployment

Spiking Transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token Pruning

Spiking Transformer

Event-driven Computing