🤖 AI Summary
To address the challenge of generating high-fidelity full-body motion videos from single-smartphone handheld capture—without fixed cameras, complex setups, or repeated rehearsals—this paper proposes an end-to-end framework. Methodologically, it fuses front- and rear-camera self-captured images with IMU motion priors to enable parameter-free frame generation; introduces a multi-reference attention mechanism for cross-view appearance alignment; incorporates an image-driven diffusion fine-tuning module to enhance frame sharpness and realism of shadows and specular reflections; and employs joint lighting-geometry rendering to ensure cross-scene consistency. Experiments demonstrate new state-of-the-art performance in pose coherence, dynamic shadow modeling, and specular reflection synthesis. The method significantly improves photorealism and generalization capability of full-body motion video generation under unconstrained mobile capture conditions.
📝 Abstract
Self-captured full-body videos are popular, but most deployments require mounted cameras, carefully-framed shots, and repeated practice. We propose a more convenient solution that enables full-body video capture using handheld mobile devices. Our approach takes as input two static photos (front and back) of you in a mirror, along with an IMU motion reference that you perform while holding your mobile phone, and synthesizes a realistic video of you performing a similar target motion. We enable rendering into a new scene, with consistent illumination and shadows. We propose a novel video diffusion-based model to achieve this. Specifically, we propose a parameter-free frame generation strategy, as well as a multi-reference attention mechanism, that effectively integrate appearance information from both the front and back selfies into the video diffusion model. Additionally, we introduce an image-based fine-tuning strategy to enhance frame sharpness and improve the generation of shadows and reflections, achieving a more realistic human-scene composition.