🤖 AI Summary
This work addresses the failure of conventional imaging under extremely low-photon conditions by proposing a novel image reconstruction method based on latent diffusion models. It introduces, for the first time, the semantic priors from large-scale text-to-image diffusion models into single-photon avalanche diode (SPAD) imaging. By designing a generative mechanism tailored to Bernoulli photon statistics and integrating latent-space restoration, spatiotemporal alignment, and joint optimization of denoising and demosaicing, the method enables high-fidelity reconstruction of both color and dynamic scenes. Experiments on synthetic data, the first color SPAD burst dataset, and the XD deformable video benchmark demonstrate that the proposed approach significantly outperforms existing techniques, achieving substantial improvements in both perceptual quality and photometric fidelity.
📝 Abstract
Capturing high-quality images from only a few detected photons is a fundamental challenge in computational imaging. Single-photon avalanche diode (SPAD) sensors promise high-quality imaging in regimes where conventional cameras fail, but raw \emph{quanta frames} contain only sparse, noisy, binary photon detections. Recovering a coherent image from a burst of such frames requires handling alignment, denoising, and demosaicing (for color) under noise statistics far outside those assumed by standard restoration pipelines or modern generative models. We present an approach that adapts large text-to-image latent diffusion models to the photon-limited domain of quanta burst imaging. Our method leverages the structural and semantic priors of internet-scale diffusion models while introducing mechanisms to handle Bernoulli photon statistics. By integrating latent-space restoration with burst-level spatio-temporal reasoning, our approach produces reconstructions that are both photometrically faithful and perceptually pleasing, even under high-speed motion. We evaluate the method on synthetic benchmarks and new real-world datasets, including the first color SPAD burst dataset and a challenging \textit{Deforming (XD)} video benchmark. Across all settings, the approach substantially improves perceptual quality over classical and modern learning-based baselines, demonstrating the promise of adapting large generative priors to extreme photon-limited sensing. Code at \href{https://github.com/Aryan-Garg/gQIR}{https://github.com/Aryan-Garg/gQIR}.