LCUDiff: Latent Capacity Upgrade Diffusion for Faithful Human Body Restoration

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing human image inpainting methods in fidelity and structural coherence. To this end, we propose LCUDiff, a single-step, high-fidelity inpainting framework that extends the latent space of a pretrained diffusion model from 4 to 16 channels. Our key innovations include Channel-Splitting Distillation (CSD) to enhance fine detail reconstruction, Prior-Preserving Adaptation (PPA) to maintain consistency with the generative prior, and a quality-aware Decoder Routing mechanism (DeR) that dynamically selects the optimal decoding path. Combined with fine-tuned variational autoencoder components, LCUDiff significantly outperforms current approaches on both synthetic and real-world datasets, effectively reducing artifacts while preserving computational efficiency through single-step inference.

Technology Category

Application Category

📝 Abstract
Existing methods for restoring degraded human-centric images often struggle with insufficient fidelity, particularly in human body restoration (HBR). Recent diffusion-based restoration methods commonly adapt pre-trained text-to-image diffusion models, where the variational autoencoder (VAE) can significantly bottleneck restoration fidelity. We propose LCUDiff, a stable one-step framework that upgrades a pre-trained latent diffusion model from the 4-channel latent space to the 16-channel latent space. For VAE fine-tuning, channel splitting distillation (CSD) is used to keep the first four channels aligned with pre-trained priors while allocating the additional channels to effectively encode high-frequency details. We further design prior-preserving adaptation (PPA) to smoothly bridge the mismatch between 4-channel diffusion backbones and the higher-dimensional 16-channel latent. In addition, we propose a decoder router (DeR) for per-sample decoder routing using restoration-quality score annotations, which improves visual quality across diverse conditions. Experiments on synthetic and real-world datasets show competitive results with higher fidelity and fewer artifacts under mild degradations, while preserving one-step efficiency. The code and model will be at https://github.com/gobunu/LCUDiff.
Problem

Research questions and friction points this paper is trying to address.

Human Body Restoration
Diffusion Models
Latent Space
Image Restoration
VAE Bottleneck
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Diffusion
Channel Splitting Distillation
Prior-Preserving Adaptation
Decoder Router
Human Body Restoration
Jue Gong
Jue Gong
Shanghai Jiao Tong University
Computer VisionImage Restoration
Z
Zihan Zhou
Shanghai Jiao Tong University
J
Jingkai Wang
Shanghai Jiao Tong University
S
Shu Li
Shenzhen Transsion Holdings Co., Ltd.
L
Libo Liu
Shenzhen Transsion Holdings Co., Ltd.
J
Jianliang Lan
Shenzhen Transsion Holdings Co., Ltd.
Yulun Zhang
Yulun Zhang
Associate Professor, Shanghai Jiao Tong University
Computer VisionImage RestorationModel CompressionLarge Language Model