🤖 AI Summary
Existing Retinex-based diffusion models achieve strong performance in low-light image enhancement but require hundreds of iterative sampling steps, hindering practical deployment. To address this, we introduce— for the first time—consistency modeling into conditional enhancement tasks, proposing a single-step consistency model tailored for Retinex decomposition. Methodologically, we design a dual-objective consistency loss and an adaptive noise-augmented sampling strategy to overcome the bias of conventional consistency training toward low-noise regions; further integrating stochastic time sampling, temporal consistency constraints, ground-truth alignment supervision, and noise-sensitive weight scheduling. On VE-LOL-L, our method achieves state-of-the-art performance in a single sampling step (PSNR: 25.51, FID: 44.73), surpassing Diff-Retinex++ while reducing training cost to merely 1/8 that of a 1000-step baseline.
📝 Abstract
Diffusion models have achieved remarkable success in low-light image enhancement through Retinex-based decomposition, yet their requirement for hundreds of iterative sampling steps severely limits practical deployment. While recent consistency models offer promising one-step generation for extit{unconditional synthesis}, their application to extit{conditional enhancement} remains unexplored. We present extbf{Consist-Retinex}, the first framework adapting consistency modeling to Retinex-based low-light enhancement. Our key insight is that conditional enhancement requires fundamentally different training dynamics than unconditional generation standard consistency training focuses on low-noise regions near the data manifold, while conditional mapping critically depends on large-noise regimes that bridge degraded inputs to enhanced outputs. We introduce two core innovations: (1) a extbf{dual-objective consistency loss} combining temporal consistency with ground-truth alignment under randomized time sampling, providing full-spectrum supervision for stable convergence; and (2) an extbf{adaptive noise-emphasized sampling strategy} that prioritizes training on large-noise regions essential for one-step conditional generation. On VE-LOL-L, Consist-Retinex achieves extbf{state-of-the-art performance with single-step sampling} ( extbf{PSNR: 25.51 vs. 23.41, FID: 44.73 vs. 49.59} compared to Diff-Retinex++), while requiring only extbf{1/8 of the training budget} relative to the 1000-step Diff-Retinex baseline.