🤖 AI Summary
This work addresses the limitations of existing human image inpainting methods in fidelity and structural coherence. To this end, we propose LCUDiff, a single-step, high-fidelity inpainting framework that extends the latent space of a pretrained diffusion model from 4 to 16 channels. Our key innovations include Channel-Splitting Distillation (CSD) to enhance fine detail reconstruction, Prior-Preserving Adaptation (PPA) to maintain consistency with the generative prior, and a quality-aware Decoder Routing mechanism (DeR) that dynamically selects the optimal decoding path. Combined with fine-tuned variational autoencoder components, LCUDiff significantly outperforms current approaches on both synthetic and real-world datasets, effectively reducing artifacts while preserving computational efficiency through single-step inference.
📝 Abstract
Existing methods for restoring degraded human-centric images often struggle with insufficient fidelity, particularly in human body restoration (HBR). Recent diffusion-based restoration methods commonly adapt pre-trained text-to-image diffusion models, where the variational autoencoder (VAE) can significantly bottleneck restoration fidelity. We propose LCUDiff, a stable one-step framework that upgrades a pre-trained latent diffusion model from the 4-channel latent space to the 16-channel latent space. For VAE fine-tuning, channel splitting distillation (CSD) is used to keep the first four channels aligned with pre-trained priors while allocating the additional channels to effectively encode high-frequency details. We further design prior-preserving adaptation (PPA) to smoothly bridge the mismatch between 4-channel diffusion backbones and the higher-dimensional 16-channel latent. In addition, we propose a decoder router (DeR) for per-sample decoder routing using restoration-quality score annotations, which improves visual quality across diverse conditions. Experiments on synthetic and real-world datasets show competitive results with higher fidelity and fewer artifacts under mild degradations, while preserving one-step efficiency. The code and model will be at https://github.com/gobunu/LCUDiff.