Rethinking High-speed Image Reconstruction Framework with Spike Camera

📅 2025-01-08

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

Spike-based cameras suffer from poor image reconstruction quality under low-light conditions—exhibiting low sharpness, severe noise, and insufficient brightness—while existing supervised methods rely heavily on synthetic data, leading to domain shift and inaccurate noise modeling. Method: This paper proposes an unpaired weakly supervised reconstruction framework that, for the first time, integrates CLIP-based cross-modal alignment into spiking image reconstruction. Leveraging scene text descriptions and unpaired high-quality natural images as weak supervision, the method employs spike-sequence encoding, text–image feature alignment, and text-guided reconstruction. Contribution/Results: By eliminating dependence on synthetic data and explicit noise modeling, the approach achieves domain-agnostic, robust reconstruction. It significantly improves texture sharpness, brightness uniformity, and semantic fidelity on real-world low-light benchmarks U-CALTECH and U-CIFAR, while enhancing compatibility with downstream vision tasks.

Technology Category

Application Category

📝 Abstract

Spike cameras, as innovative neuromorphic devices, generate continuous spike streams to capture high-speed scenes with lower bandwidth and higher dynamic range than traditional RGB cameras. However, reconstructing high-quality images from the spike input under low-light conditions remains challenging. Conventional learning-based methods often rely on the synthetic dataset as the supervision for training. Still, these approaches falter when dealing with noisy spikes fired under the low-light environment, leading to further performance degradation in the real-world dataset. This phenomenon is primarily due to inadequate noise modelling and the domain gap between synthetic and real datasets, resulting in recovered images with unclear textures, excessive noise, and diminished brightness. To address these challenges, we introduce a novel spike-to-image reconstruction framework SpikeCLIP that goes beyond traditional training paradigms. Leveraging the CLIP model's powerful capability to align text and images, we incorporate the textual description of the captured scene and unpaired high-quality datasets as the supervision. Our experiments on real-world low-light datasets U-CALTECH and U-CIFAR demonstrate that SpikeCLIP significantly enhances texture details and the luminance balance of recovered images. Furthermore, the reconstructed images are well-aligned with the broader visual features needed for downstream tasks, ensuring more robust and versatile performance in challenging environments.

Problem

Research questions and friction points this paper is trying to address.

Low-light imaging

Spiking camera

Noise reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

SpikeCLIP

Low-Light Imaging

Image Reconstruction

🔎 Similar Papers

SpikeGS: Reconstruct 3D scene via fast-moving bio-inspired sensors