π€ AI Summary
UHD image restoration faces an inherent trade-off between computational efficiency and faithful high-frequency detail preservation. To address this, we propose Latent Harmonyβa two-stage latent-space framework. In the first stage, we design the LH-VAE encoder, which jointly incorporates visual semantic constraints and degradation-aware perturbations to enhance discriminability of latent representations. In the second stage, we introduce HF-LoRA, a lightweight adapter module for high-frequency residual modeling, and a perception-driven LoRA-tuned decoder; fidelity-perception balance is explicitly controlled via an Ξ±-parameterized high-frequency alignment loss and selective gradient propagation. Evaluated on both UHD and standard-resolution benchmarks, Latent Harmony achieves state-of-the-art performance with significantly reduced computational overhead, yielding reconstructions that simultaneously exhibit structural accuracy and textural realism.
π Abstract
Ultra-High Definition (UHD) image restoration faces a trade-off between computational efficiency and high-frequency detail retention. While Variational Autoencoders (VAEs) improve efficiency via latent-space processing, their Gaussian constraint often discards degradation-specific high-frequency information, hurting reconstruction fidelity. To overcome this, we propose Latent Harmony, a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.In Stage One, we introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradation perturbations, while latent equivariance strengthens high-frequency reconstruction.Stage Two jointly trains this refined VAE with a restoration model using High-Frequency Low-Rank Adaptation (HF-LoRA): an encoder LoRA guided by a fidelity-oriented high-frequency alignment loss to recover authentic details, and a decoder LoRA driven by a perception-oriented loss to synthesize realistic textures. Both LoRA modules are trained via alternating optimization with selective gradient propagation to preserve the pretrained latent structure.At inference, a tunable parameter Ξ± enables flexible fidelity-perception trade-offs.Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.