Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional compressed sensing imaging (CSI) for gigapixel high-speed video (100–1000 fps) suffers from excessive power consumption, low coding efficiency of random sampling, and poor hardware compatibility. Method: We propose an Ultra-Sparse Snapshot (USS) sampling mechanism implemented via a digital micromirror device (DMD) for low-power optical encoding. A factorizable sparse measurement model is formulated to enable on-chip integration, and a lightweight Transformer architecture—incorporating local block attention, global sparse attention, and temporal attention—is designed for high-fidelity video reconstruction. Contribution/Results: This work introduces the first hardware-friendly ultra-sparse sampling strategy, substantially improving dynamic range and system compatibility while breaking theoretical and practical bottlenecks inherent in random-sampling-based CSI. Simulations and real-world experiments demonstrate that our method achieves significantly superior reconstruction quality over state-of-the-art approaches at equivalent sparsity levels, while reducing power consumption and enhancing real-time processing capability.

Technology Category

Application Category

📝 Abstract
Digital cameras consume ~0.1 microjoule per pixel to capture and encode video, resulting in a power usage of ~20W for a 4K sensor operating at 30 fps. Imagining gigapixel cameras operating at 100-1000 fps, the current processing model is unsustainable. To address this, physical layer compressive measurement has been proposed to reduce power consumption per pixel by 10-100X. Video Snapshot Compressive Imaging (SCI) introduces high frequency modulation in the optical sensor layer to increase effective frame rate. A commonly used sampling strategy of video SCI is Random Sampling (RS) where each mask element value is randomly set to be 0 or 1. Similarly, image inpainting (I2P) has demonstrated that images can be recovered from a fraction of the image pixels. Inspired by I2P, we propose Ultra-Sparse Sampling (USS) regime, where at each spatial location, only one sub-frame is set to 1 and all others are set to 0. We then build a Digital Micro-mirror Device (DMD) encoding system to verify the effectiveness of our USS strategy. Ideally, we can decompose the USS measurement into sub-measurements for which we can utilize I2P algorithms to recover high-speed frames. However, due to the mismatch between the DMD and CCD, the USS measurement cannot be perfectly decomposed. To this end, we propose BSTFormer, a sparse TransFormer that utilizes local Block attention, global Sparse attention, and global Temporal attention to exploit the sparsity of the USS measurement. Extensive results on both simulated and real-world data show that our method significantly outperforms all previous state-of-the-art algorithms. Additionally, an essential advantage of the USS strategy is its higher dynamic range than that of the RS strategy. Finally, from the application perspective, the USS strategy is a good choice to implement a complete video SCI system on chip due to its fixed exposure time.
Problem

Research questions and friction points this paper is trying to address.

Reducing power consumption in ultra-high-resolution video sensors
Recovering high-speed video from ultra-sparse compressive measurements
Addressing hardware mismatch in snapshot compressive imaging systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ultra-Sparse Sampling with single sub-frame activation
Digital Micro-mirror Device encoding system implementation
Sparse Transformer with multi-attention mechanism design
🔎 Similar Papers
No similar papers found.