When Spike Sparsity Does Not Translate to Deployed Cost: VS-WNO on Jetson Orin Nano

📅 2026-04-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

250K/year
🤖 AI Summary
This study investigates whether spike sparsity can effectively reduce latency and energy consumption in edge GPU deployments. Leveraging the Darcy rectangle benchmark on a Jetson Orin Nano platform, we compare the deployment performance of Variable Spiking Wavelet Neural Operators (VS-WNO) against dense WNOs, complemented by runtime overhead analysis using Nsight Systems. Our experiments provide the first empirical evidence that, despite its algorithmic sparsity, VS-WNO exhibits significantly higher latency (59.6 ms) and energy consumption (228.0 mJ) compared to dense WNO (53.2 ms, 180.7 mJ) under current commercial edge GPU software stacks. These findings reveal a critical bottleneck: algorithmic sparsity does not readily translate into practical deployment benefits on existing hardware-software infrastructures.

Technology Category

Application Category

📝 Abstract
Spiking neural operators are appealing for neuromorphic edge computing because event-driven substrates can, in principle, translate sparse activity into lower latency and energy. Whether that advantage survives deployment on commodity edge-GPU software stacks, however, remains unclear. We study this question on a Jetson Orin Nano 8 GB using five pretrained variable-spiking wavelet neural operator (VS-WNO) checkpoints and five matched dense wavelet neural operator (WNO) checkpoints on the Darcy rectangular benchmark. On a reference-aligned path, VS-WNO exhibits substantial algorithmic sparsity, with mean spike rates decreasing from 54.26% at the first spiking layer to 18.15% at the fourth. On a deployment-style request path, however, this sparsity does not reduce deployed cost: VS-WNO reaches 59.6 ms latency and 228.0 mJ dynamic energy per inference, whereas dense WNO reaches 53.2 ms and 180.7 mJ, while also achieving slightly lower reference-path error (1.77% versus 1.81%). Nsight Systems indicates that the request path remains launch-dominated and dense rather than sparsity-aware: for VS-WNO, cudaLaunchKernel accounts for 81.6% of CUDA API time within the latency window, and dense convolution kernels account for 53.8% of GPU kernel time; dense WNO shows the same pattern. On this Jetson-class GPU stack, spike sparsity is measurable but does not reduce deployed cost because the runtime does not suppress dense work as spike activity decreases.
Problem

Research questions and friction points this paper is trying to address.

spike sparsity
deployed cost
neuromorphic edge computing
edge-GPU software stacks
latency and energy
Innovation

Methods, ideas, or system contributions that make the work stand out.

spiking neural operators
spike sparsity
edge deployment
neuromorphic computing
GPU runtime efficiency
🔎 Similar Papers
No similar papers found.