ReDirector: Creating Any-Length Video Retakes with Rotary Camera Encoding

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses three key challenges in dynamic video re-rendering: (1) difficulty in spatiotemporal alignment, (2) misapplication of RoPE (Rotary Position Embedding) for camera-conditioned modeling, and (3) poor generalization to variable-length videos. To this end, we propose Rotary Camera Encoding (RoCE), a novel camera-conditioned positional encoding mechanism. RoCE uniquely incorporates camera pose parameters into the phase shift of RoPE, enabling robust modeling of out-of-distribution camera trajectories and arbitrarily long videos. By explicitly encoding multi-view geometric relationships between input and target videos, RoCE significantly improves dynamic object localization accuracy and background consistency. Integrated into Transformer-based architectures, RoCE ensures spatiotemporally coherent generation. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches across diverse camera motions and video lengths, achieving new SOTA performance in camera controllability, geometric consistency, and visual fidelity.

Technology Category

Application Category

📝 Abstract

We present ReDirector, a novel camera-controlled video retake generation method for dynamically captured variable-length videos. In particular, we rectify a common misuse of RoPE in previous works by aligning the spatiotemporal positions of the input video and the target retake. Moreover, we introduce Rotary Camera Encoding (RoCE), a camera-conditioned RoPE phase shift that captures and integrates multi-view relationships within and across the input and target videos. By integrating camera conditions into RoPE, our method generalizes to out-of-distribution camera trajectories and video lengths, yielding improved dynamic object localization and static background preservation. Extensive experiments further demonstrate significant improvements in camera controllability, geometric consistency, and video quality across various trajectories and lengths.

Problem

Research questions and friction points this paper is trying to address.

Generating variable-length video retakes with camera control

Correcting RoPE misuse to align spatiotemporal video positions

Generalizing camera trajectories for improved object localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rotary Camera Encoding integrates camera conditions into RoPE

Aligns spatiotemporal positions between input and target videos

Generalizes to out-of-distribution camera trajectories and lengths

🔎 Similar Papers

CameraCtrl: Enabling Camera Control for Text-to-Video Generation