🤖 AI Summary
For unsupervised image inverse problems, existing posterior sampling or MAP estimation methods based on pre-trained diffusion models suffer from substantial modeling approximation errors and high computational overhead. This paper proposes VML-MAP: a novel algorithm that directly approximates the measurement-conditioned maximum a posteriori (MAP) estimate via a variational modal optimization mechanism—requiring no task-specific fine-tuning. We introduce the Variational Modal Loss (VML), the first loss function that unifies diffusion priors and measurement posteriors under a KL-divergence minimization framework; for linear inverse problems, VML admits closed-form derivation, eliminating approximation error entirely. Evaluated across multiple image restoration benchmarks, VML-MAP achieves state-of-the-art reconstruction accuracy while significantly accelerating inference—demonstrating superior precision and efficiency simultaneously.
📝 Abstract
A pre-trained unconditional diffusion model, combined with posterior sampling or maximum a posteriori (MAP) estimation techniques, can solve arbitrary inverse problems without task-specific training or fine-tuning. However, existing posterior sampling and MAP estimation methods often rely on modeling approximations and can be computationally demanding. In this work, we propose the variational mode-seeking loss (VML), which, when minimized during each reverse diffusion step, guides the generated sample towards the MAP estimate. VML arises from a novel perspective of minimizing the Kullback-Leibler (KL) divergence between the diffusion posterior $p(mathbf{x}_0|mathbf{x}_t)$ and the measurement posterior $p(mathbf{x}_0|mathbf{y})$, where $mathbf{y}$ denotes the measurement. Importantly, for linear inverse problems, VML can be analytically derived and need not be approximated. Based on further theoretical insights, we propose VML-MAP, an empirically effective algorithm for solving inverse problems, and validate its efficacy over existing methods in both performance and computational time, through extensive experiments on diverse image-restoration tasks across multiple datasets.