🤖 AI Summary
Existing video generation methods rely on verbose text prompts, neglect cinematic elements such as camera motion, and lack explicit 3D structural modeling—leading to inter-frame character inconsistency, rigid camera trajectories, and diminished immersion. To address these limitations, we propose an end-to-end, cinematic-grade visual authoring framework tailored for non-expert users. Our approach is the first to deeply integrate generative AI into the filmmaking pipeline. It introduces a structure-guided motion transfer strategy that jointly leverages human and camera pose estimation, trajectory optimization, customizable 3D character generation grounded in structural priors, real-time rendering, and motion retargeting. The framework supports free-form camera control and classic cinematographic style transfer. Experiments demonstrate substantial improvements across diverse scenarios in visual fidelity, motion naturalness, and user controllability—enabling highly consistent, temporally smooth, and cinematic video generation.
📝 Abstract
We are living in a flourishing era of digital media, where everyone has the potential to become a personal filmmaker. Current research on cinematic transfer empowers filmmakers to reproduce and manipulate the visual elements (e.g., cinematography and character behaviors) from classic shots. However, characters in the reimagined films still rely on manual crafting, which involves significant technical complexity and high costs, making it unattainable for ordinary users. Furthermore, their estimated cinematography lacks smoothness due to inadequate capturing of inter-frame motion and modeling of physical trajectories. Fortunately, the remarkable success of 2D and 3D AIGC has opened up the possibility of efficiently generating characters tailored to users' needs, diversifying cinematography. In this paper, we propose DreamCinema, a novel cinematic transfer framework that pioneers generative AI into the film production paradigm, aiming at facilitating user-friendly film creation. Specifically, we first extract cinematic elements (i.e., human and camera pose) and optimize the camera trajectory. Then, we apply a character generator to efficiently create 3D high-quality characters with a human structure prior. Finally, we develop a structure-guided motion transfer strategy to incorporate generated characters into film creation and transfer it via 3D graphics engines smoothly. Extensive experiments demonstrate the effectiveness of our method for creating high-quality films with free camera and 3D characters.