ReRoPE: Repurposing RoPE for Relative Camera Control

📅 2026-02-08

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing video generation methods rely on relative pose encoding with respect to a fixed reference frame for camera viewpoint control, which lacks translational invariance and often leads to poor generalization and cumulative drift. This work proposes a plug-and-play framework that, for the first time, leverages the underutilized low-frequency spectral bands of Rotary Position Embedding (RoPE) to encode relative camera poses between arbitrary viewpoint pairs. The approach enables efficient and controllable fine-tuning of pretrained video diffusion models without modifying their architecture or requiring extensive retraining. Extensive experiments demonstrate that the method achieves high-precision camera control and excellent visual fidelity in both image-to-video and video-to-video tasks, confirming its versatility and effectiveness.

Technology Category

Application Category

📝 Abstract

Video generation with controllable camera viewpoints is essential for applications such as interactive content creation, gaming, and simulation. Existing methods typically adapt pre-trained video models using camera poses relative to a fixed reference, e.g., the first frame. However, these encodings lack shift-invariance, often leading to poor generalization and accumulated drift. While relative camera pose embeddings defined between arbitrary view pairs offer a more robust alternative, integrating them into pre-trained video diffusion models without prohibitive training costs or architectural changes remains challenging. We introduce ReRoPE, a plug-and-play framework that incorporates relative camera information into pre-trained video diffusion models without compromising their generation capability. Our approach is based on the insight that Rotary Positional Embeddings (RoPE) in existing models underutilize their full spectral bandwidth, particularly in the low-frequency components. By seamlessly injecting relative camera pose information into these underutilized bands, ReRoPE achieves precise control while preserving strong pre-trained generative priors. We evaluate our method on both image-to-video (I2V) and video-to-video (V2V) tasks in terms of camera control accuracy and visual fidelity. Our results demonstrate that ReRoPE offers a training-efficient path toward controllable, high-fidelity video generation. See project page for more results: https://sisyphe-lee.github.io/ReRoPE/

Problem

Research questions and friction points this paper is trying to address.

relative camera control

video generation

diffusion models

camera pose

controllable generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReRoPE

relative camera control

Rotary Positional Embeddings