🤖 AI Summary
Diffusion models frequently produce severe hand structural distortions in portrait generation, hindering practical deployment. To address this, we propose RHanDS, a conditional diffusion framework featuring a novel decoupled dual-guidance mechanism: geometric constraints are imposed via 3D hand mesh priors, while stylistic consistency is enforced using hand-region features extracted from the original image. We further introduce a two-stage training strategy—first optimizing structural controllability, then refining stylistic fidelity—and construct the first large-scale, multi-style paired hand dataset. RHanDS achieves state-of-the-art hand restoration performance across multiple benchmarks, significantly improving hand structural accuracy while preserving texture, lighting, and pose consistency with the input portrait.
📝 Abstract
Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. In this paper, we introduce RHanDS, a conditional diffusion-based framework designed to refine malformed hands by utilizing decoupled structure and style guidance. The hand mesh reconstructed from the malformed hand offers structure guidance for correcting the structure of the hand, while the malformed hand itself provides style guidance for preserving the style of the hand. To alleviate the mutual interference between style and structure guidance, we introduce a two-stage training strategy and build a series of multi-style hand datasets. In the first stage, we use paired hand images for training to ensure stylistic consistency in hand refining. In the second stage, various hand images generated based on human meshes are used for training, enabling the model to gain control over the hand structure. Experimental results demonstrate that RHanDS can effectively refine hand structure while preserving consistency in hand style.