Odo: Depth-Guided Diffusion for Identity-Preserving Body Reshaping

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing methods for human shape editing suffer from severe distortions in body proportions, texture warping, and background inconsistency, compounded by the absence of large-scale benchmark datasets. To address these challenges, this paper introduces the first large-scale dataset specifically designed for human shape editing. We further propose an end-to-end deep-guided diffusion model: the UNet backbone is frozen to preserve identity, pose, and clothing consistency, while a SMPL depth-map-driven ControlNet is tightly coupled to enable fine-grained semantic control over shape deformation. This architecture substantially improves geometric fidelity and visual realism. Quantitative evaluation demonstrates a vertex reconstruction error of only 7.5 mm—significantly lower than the baseline (13.6 mm)—and achieves state-of-the-art performance in both target shape alignment and generation quality.

Technology Category

Application Category

📝 Abstract

Human shape editing enables controllable transformation of a person's body shape, such as thin, muscular, or overweight, while preserving pose, identity, clothing, and background. Unlike human pose editing, which has advanced rapidly, shape editing remains relatively underexplored. Current approaches typically rely on 3D morphable models or image warping, often introducing unrealistic body proportions, texture distortions, and background inconsistencies due to alignment errors and deformations. A key limitation is the lack of large-scale, publicly available datasets for training and evaluating body shape manipulation methods. In this work, we introduce the first large-scale dataset of 18,573 images across 1523 subjects, specifically designed for controlled human shape editing. It features diverse variations in body shape, including fat, muscular and thin, captured under consistent identity, clothing, and background conditions. Using this dataset, we propose Odo, an end-to-end diffusion-based method that enables realistic and intuitive body reshaping guided by simple semantic attributes. Our approach combines a frozen UNet that preserves fine-grained appearance and background details from the input image with a ControlNet that guides shape transformation using target SMPL depth maps. Extensive experiments demonstrate that our method outperforms prior approaches, achieving per-vertex reconstruction errors as low as 7.5mm, significantly lower than the 13.6mm observed in baseline methods, while producing realistic results that accurately match the desired target shapes.

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale datasets for body shape editing

Unrealistic body proportions in current reshaping methods

Need for identity-preserving body shape transformation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset for body shape editing

Diffusion-based method with semantic guidance

Combines frozen UNet and ControlNet

🔎 Similar Papers

FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models