π€ AI Summary
Long-video generation faces fundamental challenges in maintaining multi-character appearance consistency, motion coherence, and scene layout stability beyond 16 secondsβmost existing models are limited to 5β16-second clips, while the few approaches supporting up to 150 seconds suffer from high frame redundancy and low temporal diversity. This work systematically reviews 32 studies and proposes the first taxonomy specifically designed for long-duration narrative video generation, uncovering key design principles for temporal consistency and high-fidelity synthesis. Methodologically, we integrate diffusion modeling with autoregressive architecture, incorporating hierarchical temporal modeling, explicit identity preservation mechanisms, and dynamic scene layout optimization. Extensive experiments demonstrate that our approach reliably generates videos β₯150 seconds long, significantly outperforming baselines in character consistency, motion coherence, and visual quality, while reducing frame redundancy by 37%.
π Abstract
Despite the significant progress that has been made in video generative models, existing state-of-the-art methods can only produce videos lasting 5-16 seconds, often labeled "long-form videos". Furthermore, videos exceeding 16 seconds struggle to maintain consistent character appearances and scene layouts throughout the narrative. In particular, multi-subject long videos still fail to preserve character consistency and motion coherence. While some methods can generate videos up to 150 seconds long, they often suffer from frame redundancy and low temporal diversity. Recent work has attempted to produce long-form videos featuring multiple characters, narrative coherence, and high-fidelity detail. We comprehensively studied 32 papers on video generation to identify key architectural components and training strategies that consistently yield these qualities. We also construct a comprehensive novel taxonomy of existing methods and present comparative tables that categorize papers by their architectural designs and performance characteristics.