PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video re-rendering methods suffer from low viewpoint control accuracy and poor detail fidelity when editing camera trajectories in dynamic scenes, primarily due to coarse motion injection strategies. To address this, we propose a post-editing framework for 6-degree-of-freedom (6-DoF) camera trajectory control. Our approach introduces a query-sharing cross-attention module that jointly models pose priors and 2D visual features; incorporates explicit 6-DoF pose injection with frame-conditioned guidance; and adopts a two-stage training strategy to enhance motion consistency and rendering quality. Evaluated on both real-world and synthetic datasets, our method outperforms state-of-the-art approaches by over 20% in camera control accuracy, view consistency, and generation quality. It significantly improves controllability and photorealism of novel-view videos in dynamic scenes.

Technology Category

Application Category

📝 Abstract
We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suffer from suboptimal camera motion injection strategies; such suboptimal designs not only limit camera control precision but also result in generated videos that fail to preserve fine visual details from the source video. To achieve more accurate and flexible motion manipulation, PostCam introduces a query-shared cross-attention module. It integrates two distinct forms of control signals: the 6-DoF camera poses and the 2D rendered video frames. By fusing them into a unified representation within a shared feature space, our model can extract underlying motion cues, which enhances both control precision and generation quality. Furthermore, we adopt a two-stage training strategy: the model first learns coarse camera control from pose inputs, and then incorporates visual information to refine motion accuracy and enhance visual fidelity. Experiments on both real-world and synthetic datasets demonstrate that PostCam outperforms state-of-the-art methods by over 20% in camera control precision and view consistency, while achieving the highest video generation quality. Our project webpage is publicly available at: https://cccqaq.github.io/PostCam.github.io/
Problem

Research questions and friction points this paper is trying to address.

Enables post-capture editing of camera trajectories in dynamic scenes
Improves camera control precision and preserves fine visual details
Integrates 6-DoF camera poses with 2D frames for motion manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Query-shared cross-attention module for motion control
Fuses 6-DoF camera poses with 2D video frames
Two-stage training strategy for refined motion accuracy
🔎 Similar Papers
No similar papers found.
Y
Yipeng Chen
State Key Lab of CAD&CG, Zhejiang University
Zhichao Ye
Zhichao Ye
Unknown affiliation
Z
Zhenzhou Fang
State Key Lab of CAD&CG, Zhejiang University
X
Xinyu Chen
State Key Lab of CAD&CG, Zhejiang University
X
Xiaoyu Zhang
Shanghai InSpatio Intelligent Technology Co., Ltd.
Jialing Liu
Jialing Liu
Shanghai InSpatio Intelligent Technology Co., Ltd.
N
Nan Wang
Shanghai InSpatio Intelligent Technology Co., Ltd.
Haomin Liu
Haomin Liu
Sensetime
SLAMStructure from Motion
G
Guofeng Zhang
State Key Lab of CAD&CG, Zhejiang University