Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously preserving asynchrony, spatial sparsity, and recognition accuracy in event camera data representation, this paper proposes Spiking Patches—a novel asynchronous sparse tokenization method. For the first time, it directly models raw event streams as spatiotemporally sparse spike tokens, losslessly retaining the intrinsic characteristics of event cameras while avoiding synchronization and densification biases inherent in conventional frame- or voxel-based representations. The generated tokens are compatible with diverse downstream architectures—including graph neural networks, point cloud networks, and Transformers. Evaluated on gesture recognition and object detection tasks, Spiking Patches achieves up to 3.4× and 10.4× faster inference compared to voxel- and frame-based baselines, respectively, while maintaining or improving accuracy by up to 3.8 and 1.4 percentage points.

Technology Category

Application Category

📝 Abstract
We propose tokenization of events and present a tokenizer, Spiking Patches, specifically designed for event cameras. Given a stream of asynchronous and spatially sparse events, our goal is to discover an event representation that preserves these properties. Prior works have represented events as frames or as voxels. However, while these representations yield high accuracy, both frames and voxels are synchronous and decrease the spatial sparsity. Spiking Patches gives the means to preserve the unique properties of event cameras and we show in our experiments that this comes without sacrificing accuracy. We evaluate our tokenizer using a GNN, PCN, and a Transformer on gesture recognition and object detection. Tokens from Spiking Patches yield inference times that are up to 3.4x faster than voxel-based tokens and up to 10.4x faster than frames. We achieve this while matching their accuracy and even surpassing in some cases with absolute improvements up to 3.8 for gesture recognition and up to 1.4 for object detection. Thus, tokenization constitutes a novel direction in event-based vision and marks a step towards methods that preserve the properties of event cameras.
Problem

Research questions and friction points this paper is trying to address.

Preserving asynchronous and sparse properties of event cameras
Overcoming synchronous limitations of frame and voxel representations
Achieving faster inference without sacrificing recognition accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenizes events into asynchronous, sparse patches
Preserves event camera properties without sacrificing accuracy
Achieves faster inference than voxel and frame methods
🔎 Similar Papers
No similar papers found.