🤖 AI Summary
Existing latent diffusion models (LDMs) for image restoration under unknown complex degradations suffer from three key limitations: reliance on predefined degradation operators, unstable latent-space guidance, and high computational overhead due to frequent pixel–latent-space mappings.
Method: We propose LatentINDIGO, an invertible neural network (INN)-guided latent diffusion framework. It introduces a wavelet-inspired INN to model arbitrary degradations without prior assumptions; designs dual-path architectures—PixelINN and LatentINN—that enable end-to-end latent-space denoising and pixel-domain detail reconstruction; and incorporates latent manifold regularization to enhance perceptual naturalness.
Contribution/Results: LatentINDIGO achieves state-of-the-art performance on both synthetic and real-world low-quality images, supports arbitrary output resolutions, and significantly reduces GPU memory consumption and computational cost compared to conventional LDM-based approaches.
📝 Abstract
There is a growing interest in the use of latent diffusion models (LDMs) for image restoration (IR) tasks due to their ability to model effectively the distribution of natural images. While significant progress has been made, there are still key challenges that need to be addressed. First, many approaches depend on a predefined degradation operator, making them ill-suited for complex or unknown degradations that deviate from standard analytical models. Second, many methods struggle to provide a stable guidance in the latent space and finally most methods convert latent representations back to the pixel domain for guidance at every sampling iteration, which significantly increases computational and memory overhead. To overcome these limitations, we introduce a wavelet-inspired invertible neural network (INN) that simulates degradations through a forward transform and reconstructs lost details via the inverse transform. We further integrate this design into a latent diffusion pipeline through two proposed approaches: LatentINDIGO-PixelINN, which operates in the pixel domain, and LatentINDIGO-LatentINN, which stays fully in the latent space to reduce complexity. Both approaches alternate between updating intermediate latent variables under the guidance of our INN and refining the INN forward model to handle unknown degradations. In addition, a regularization step preserves the proximity of latent variables to the natural image manifold. Experiments demonstrate that our algorithm achieves state-of-the-art performance on synthetic and real-world low-quality images, and can be readily adapted to arbitrary output sizes.