ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

πŸ“… 2026-04-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

197K/year
πŸ€– AI Summary
Existing methods struggle to simultaneously achieve high appearance fidelity, natural motion dynamics, and controllable camera viewpoints in human video synthesis when multi-view data are limited. This work proposes an β€œimage-first” generation paradigm: it first leverages a pre-trained image generation model to learn a high-quality human appearance prior, then integrates SMPL-X pose conditioning with a pre-trained video diffusion model. Through a training-free temporal optimization strategy, the approach enables pose- and viewpoint-controllable, high-fidelity video synthesis. By effectively decoupling appearance modeling from temporal consistency, the method significantly enhances both visual quality and controllability. The authors also release a standardized human dataset and the corresponding synthesis model to support future research.

Technology Category

Application Category

πŸ“ Abstract
Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods often address these factors separately, resulting in limited controllability or reduced visual quality. We revisit this problem from an image-first perspective, where high-quality human appearance is learned via image generation and used as a prior for video synthesis, decoupling appearance modeling from temporal consistency. We propose a pose- and viewpoint-controllable pipeline that combines a pretrained image backbone with SMPL-X-based motion guidance, together with a training-free temporal refinement stage based on a pretrained video diffusion model. Our method produces high-quality, temporally consistent videos under diverse poses and viewpoints. We also release a canonical human dataset and an auxiliary model for compositional human image synthesis. Code and data are publicly available at https://github.com/Taited/ReImagine.
Problem

Research questions and friction points this paper is trying to address.

human video generation
appearance modeling
motion modeling
camera viewpoint
temporal consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

image-first synthesis
controllable human video generation
SMPL-X motion guidance
temporal refinement
pretrained diffusion models