🤖 AI Summary
This work addresses the limitations of existing burst image super-resolution methods, which often produce over-smoothed results in complex texture regions and fail to effectively leverage pretrained generative priors. To overcome these issues, the authors propose BurstGP, a novel diffusion-based approach that innovatively integrates video-level sRGB generative priors with raw burst inputs. The method introduces a degradation-aware conditioning mechanism and a robust sRGB-to-linear RGB inverse transformation module, further enhanced by multi-frame alignment cues to improve fine detail synthesis. BurstGP achieves significantly sharper textures and structures while preserving high fidelity, outperforming state-of-the-art methods on perceptual metrics such as MUSIQ and LPIPS, thereby enabling more photorealistic super-resolution reconstruction.
📝 Abstract
Burst image super resolution (BISR) aims to construct a single high-resolution (HR) image by aggregating information from multiple low-resolution (LR) frames, relying on temporal redundancy and spatial coherence across the burst. While conventional methods achieve impressive results, they often struggle with complex textures and oversmoothing. Diffusion models, particularly those pretrained on high-quality data, have shown remarkable capability in generating realistic details for image and video super-resolution. However, their potential remains largely under-explored in BISR, where existing approaches typically rely on task-specific diffusion models trained from scratch and operate on single-frame reconstructions. In this work, we propose BurstGP, a novel diffusion-based solution for BISR, which leverages generative priors of recent foundation models to overcome these issues. In particular, we build a multiframe-aware diffusion model on top of a conventional BISR approach, which boosts image quality with minimal loss to fidelity. Further, we introduce (i) a novel degradation-aware conditioning mechanism, which controls synthesis of fine details based on the estimated degradation in the input, and (ii) a robust sRGB-to-lRGB inverter, enabling us to utilize generative multiframe (video) sRGB priors, while operating with raw input and lRGB output images. Empirically, we demonstrate that BurstGP outperforms the existing state of the art, both quantitatively (especially with respect to perceptual metrics, including MUSIQ and LPIPS) and qualitatively. In particular, our proposed method excels at recovering richer textures and finer structural details, highlighting the potential of video priors for BISR over traditional methods.