🤖 AI Summary
This work addresses the high computational cost of spiking Transformers in multi-step inference, which stems from redundant tokens and the lack of temporal-aware assessment of token importance. To this end, we propose Uncert, a novel framework that, for the first time, leverages the trajectory of category-evidence uncertainty across multiple spiking timesteps as a criterion for token importance. Specifically, token-level class evidence is modeled via Dirichlet distributions, and an untrained, plug-and-play importance score is derived from the mean and volatility of its temporal uncertainty. This enables effective token pruning without retraining. Evaluated on both static and neuromorphic vision benchmarks, Uncert consistently achieves superior accuracy–efficiency trade-offs, maintaining robust performance even under high pruning ratios, thereby establishing a new paradigm for efficient spiking Transformer inference.
📝 Abstract
Spiking transformers have shown strong potential for neuromorphic vision, yet their token processing across multiple spiking steps still introduces substantial redundancy and inference cost. Existing token reduction methods mainly rely on response based cues, such as activation magnitude, firing statistics, or feature similarity. Although effective, these criteria do not explicitly characterize token importance from the perspective of temporally evolving class evidence. In spiking transformers, token representations are progressively formed across multiple spiking steps rather than determined at a single instant, suggesting that token importance should be evaluated not only by instantaneous responses but also by temporal uncertainty patterns. Our key observation is that tokens exhibit heterogeneous uncertainty trajectories over time, and that their temporally aggregated uncertainty statistics provide an effective cue for distinguishing informative tokens from redundant ones. Motivated by this, we propose Uncert, a training free and plug and play token importance estimation framework for spiking transformers. Specifically, Uncert models token wise class evidence with a Dirichlet distribution and summarizes each token temporal uncertainty using its mean and fluctuation across spiking steps, yielding an uncertainty aware importance score for token reduction during inference. Experiments on both static and neuromorphic benchmarks show that Uncert achieves favorable accuracy and efficiency tradeoffs, with the most consistent gains observed under token pruning. Further analysis reveals a clear empirical connection between temporal uncertainty patterns and token contribution, offering new insights into token dynamics in spiking transformers.