CineLOG: A Training Free Approach for Cinematic Long Video Generation

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Current controllable video generation models struggle with fine-grained control over camera trajectories and cinematic genres, hindered by dataset imbalance, severe label noise, and the simulation-to-reality gap. To address these challenges, we propose: (1) the first high-fidelity, uncut, and balanced cinematic video dataset comprising 5,000 professionally produced clips, annotated for 17 standard camera motions and 15 historically significant film genres; (2) a training-agnostic, decoupled four-stage generation paradigm—text → storyboarding → camera-motion planning → synthesis—augmented with a trajectory-guided transition module to ensure spatiotemporal coherence across multi-shot sequences; and (3) a film-theory-informed structured annotation framework. Human evaluation shows a 42% improvement in camera instruction adherence, 91.3% script consistency, and professional-grade visual quality—significantly outperforming state-of-the-art end-to-end text-to-video models.

Technology Category

Application Category

📝 Abstract

Controllable video synthesis is a central challenge in computer vision, yet current models struggle with fine grained control beyond textual prompts, particularly for cinematic attributes like camera trajectory and genre. Existing datasets often suffer from severe data imbalance, noisy labels, or a significant simulation to real gap. To address this, we introduce CineLOG, a new dataset of 5,000 high quality, balanced, and uncut video clips. Each entry is annotated with a detailed scene description, explicit camera instructions based on a standard cinematic taxonomy, and genre label, ensuring balanced coverage across 17 diverse camera movements and 15 film genres. We also present our novel pipeline designed to create this dataset, which decouples the complex text to video (T2V) generation task into four easier stages with more mature technology. To enable coherent, multi shot sequences, we introduce a novel Trajectory Guided Transition Module that generates smooth spatio-temporal interpolation. Extensive human evaluations show that our pipeline significantly outperforms SOTA end to end T2V models in adhering to specific camera and screenplay instructions, while maintaining professional visual quality. All codes and data are available at https://cine-log.pages.dev.

Problem

Research questions and friction points this paper is trying to address.

Generates cinematic videos with precise camera control

Addresses data imbalance and noise in video datasets

Enables coherent multi-shot sequences via trajectory guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with balanced cinematic annotations

Pipeline decoupling T2V into four stages

Trajectory Guided Transition Module for interpolation

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

Sr. Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

AI Research Scientist, Computer Vision - Facebook Video Intelligence