Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating high-quality 3D characters from a single image remains challenging due to complex poses and self-occlusions. This work proposes the RCM framework, which aligns characters in arbitrary poses to a canonical pose and leverages an image-to-video diffusion model enhanced with multi-view conditional control to achieve high-fidelity, view-consistent novel view synthesis and 3D generation. The method supports input images with arbitrarily complex poses, incorporates up to four conditional views, allows controllable initial camera poses, and generates orbit videos at 1024×1024 resolution. Extensive evaluations demonstrate that RCM significantly outperforms state-of-the-art approaches in both visual quality and cross-view consistency.

Technology Category

Application Category

📝 Abstract
Generating high-quality 3D characters from single images remains a significant challenge in digital content creation, particularly due to complex body poses and self-occlusion. In this paper, we present RCM (Rotate your Character Model), an advanced image-to-video diffusion framework tailored for high-quality novel view synthesis (NVS) and 3D character generation. Compared to existing diffusion-based approaches, RCM offers several key advantages: (1) transferring characters with any complex poses into a canonical pose, enabling consistent novel view synthesis across the entire viewing orbit, (2) high-resolution orbital video generation at 1024x1024 resolution, (3) controllable observation positions given different initial camera poses, and (4) multi-view conditioning supporting up to 4 input images, accommodating diverse user scenarios. Extensive experiments demonstrate that RCM outperforms state-of-the-art methods in both novel view synthesis and 3D generation quality.
Problem

Research questions and friction points this paper is trying to address.

3D character generation
novel view synthesis
self-occlusion
complex poses
image-to-3D
Innovation

Methods, ideas, or system contributions that make the work stand out.

video diffusion models
novel view synthesis
3D character generation
canonical pose alignment
multi-view conditioning
🔎 Similar Papers
No similar papers found.
J
Jin Wang
Hunyuan, Tencent
J
Jianxiang Lu
Hunyuan, Tencent
C
Comi Chen
Hunyuan, Tencent
G
Guangzheng Xu
Hunyuan, Tencent
H
Haoyu Yang
Hunyuan, Tencent
P
Peng Chen
Hunyuan, Tencent
N
Na Zhang
Hunyuan, Tencent
Y
Yifan Xu
Hunyuan, Tencent
L
Longhuang Wu
Hunyuan, Tencent
Shuai Shao
Shuai Shao
Tencent
Computer VisionMultimediaAIGC
Q
Qinglin Lu
Hunyuan, Tencent
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing