SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video generation methods rely on sparse conditioning signals—such as text prompts or start-end frames—and struggle to precisely control narrative structure and temporal pacing. This work proposes a two-stage, multi-keyframe-guided generation framework: first, Director-Gen produces a low-resolution video draft with controllable rhythm; then, Director-SR refines it by incorporating high-resolution keyframes to recover fine visual details. The approach enables flexible multi-keyframe video synthesis for the first time, supporting single-shot and multi-shot composition as well as video extension. To facilitate training on cinematic sequences, the authors also construct a dedicated data pipeline for film-like multi-shot clips. Experiments demonstrate that the method significantly outperforms existing techniques in both narrative coherence and visual fidelity, generating temporally controlled, high-fidelity long-form videos.
📝 Abstract
The narrative quality of a video fundamentally determines its perceptual value. Although existing video generation methods can produce visually appealing content, they predominantly rely on sparse conditioning signals such as text prompts or first/last frames, which limits precise control over narrative structure and temporal pacing. In this paper, we propose SmartDirector, a framework that enhances the narrative capacity of video generation models through multiple keyframes. SmartDirector supports flexible generation scenarios including single-shot generation, multi-shot narrative synthesis, and video extension. The framework operates in two stages: Director-Gen generates a low-resolution video conditioned on the provided keyframes, and Director-SR refines the output by exploiting high-resolution keyframes as semantic anchors to recover fine-grained details. To enable robust multi-keyframe training, we construct a data pipeline that curates single-shot and multi-shot sequences from movies. Extensive experiments demonstrate that SmartDirector substantially outperforms existing state-of-the-art approaches. We will release the code to facilitate further research.
Problem

Research questions and friction points this paper is trying to address.

narrative structure
temporal pacing
video generation
keyframe conditioning
cinematic video
Innovation

Methods, ideas, or system contributions that make the work stand out.

keyframe-conditioned
narrative pacing control
cinematic video generation
two-stage refinement
multi-shot synthesis