When Spike Sparsity Does Not Translate to Deployed Cost: VS-WNO on Jetson Orin Nano

📅 2026-04-18

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study investigates whether spike sparsity can effectively reduce latency and energy consumption in edge GPU deployments. Leveraging the Darcy rectangle benchmark on a Jetson Orin Nano platform, we compare the deployment performance of Variable Spiking Wavelet Neural Operators (VS-WNO) against dense WNOs, complemented by runtime overhead analysis using Nsight Systems. Our experiments provide the first empirical evidence that, despite its algorithmic sparsity, VS-WNO exhibits significantly higher latency (59.6 ms) and energy consumption (228.0 mJ) compared to dense WNO (53.2 ms, 180.7 mJ) under current commercial edge GPU software stacks. These findings reveal a critical bottleneck: algorithmic sparsity does not readily translate into practical deployment benefits on existing hardware-software infrastructures.

Technology Category

Application Category

📝 Abstract

Spiking neural operators are appealing for neuromorphic edge computing because event-driven substrates can, in principle, translate sparse activity into lower latency and energy. Whether that advantage survives deployment on commodity edge-GPU software stacks, however, remains unclear. We study this question on a Jetson Orin Nano 8 GB using five pretrained variable-spiking wavelet neural operator (VS-WNO) checkpoints and five matched dense wavelet neural operator (WNO) checkpoints on the Darcy rectangular benchmark. On a reference-aligned path, VS-WNO exhibits substantial algorithmic sparsity, with mean spike rates decreasing from 54.26% at the first spiking layer to 18.15% at the fourth. On a deployment-style request path, however, this sparsity does not reduce deployed cost: VS-WNO reaches 59.6 ms latency and 228.0 mJ dynamic energy per inference, whereas dense WNO reaches 53.2 ms and 180.7 mJ, while also achieving slightly lower reference-path error (1.77% versus 1.81%). Nsight Systems indicates that the request path remains launch-dominated and dense rather than sparsity-aware: for VS-WNO, cudaLaunchKernel accounts for 81.6% of CUDA API time within the latency window, and dense convolution kernels account for 53.8% of GPU kernel time; dense WNO shows the same pattern. On this Jetson-class GPU stack, spike sparsity is measurable but does not reduce deployed cost because the runtime does not suppress dense work as spike activity decreases.

Problem

Research questions and friction points this paper is trying to address.

spike sparsity

deployed cost

neuromorphic edge computing

edge-GPU software stacks

latency and energy

Innovation

Methods, ideas, or system contributions that make the work stand out.

spiking neural operators

spike sparsity

edge deployment