🤖 AI Summary
Generating high-quality 3D characters from a single image remains challenging due to complex poses and self-occlusions. This work proposes the RCM framework, which aligns characters in arbitrary poses to a canonical pose and leverages an image-to-video diffusion model enhanced with multi-view conditional control to achieve high-fidelity, view-consistent novel view synthesis and 3D generation. The method supports input images with arbitrarily complex poses, incorporates up to four conditional views, allows controllable initial camera poses, and generates orbit videos at 1024×1024 resolution. Extensive evaluations demonstrate that RCM significantly outperforms state-of-the-art approaches in both visual quality and cross-view consistency.
📝 Abstract
Generating high-quality 3D characters from single images remains a significant challenge in digital content creation, particularly due to complex body poses and self-occlusion. In this paper, we present RCM (Rotate your Character Model), an advanced image-to-video diffusion framework tailored for high-quality novel view synthesis (NVS) and 3D character generation. Compared to existing diffusion-based approaches, RCM offers several key advantages: (1) transferring characters with any complex poses into a canonical pose, enabling consistent novel view synthesis across the entire viewing orbit, (2) high-resolution orbital video generation at 1024x1024 resolution, (3) controllable observation positions given different initial camera poses, and (4) multi-view conditioning supporting up to 4 input images, accommodating diverse user scenarios. Extensive experiments demonstrate that RCM outperforms state-of-the-art methods in both novel view synthesis and 3D generation quality.