MemCam: Memory-Augmented Camera Control for Consistent Video Generation

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation of scene consistency in interactive video generation under dynamic camera control, particularly over long durations. To mitigate this issue, the authors propose a memory-augmented architecture that treats previously generated frames as external memory and employs context compression together with a co-visibility–based historical frame selection mechanism. This design enables efficient retrieval of long-range, relevant, and compact contextual information to guide camera motion. The approach maintains computational efficiency while significantly improving scene consistency in long videos and under large-angle camera movements, outperforming existing baselines and open-source state-of-the-art methods on key evaluation metrics.
📝 Abstract
Interactive video generation has significant potential for scene simulation and video creation. However, existing methods often struggle with maintaining scene consistency during long video generation under dynamic camera control due to limited contextual information. To address this challenge, we propose MemCam, a memory-augmented interactive video generation approach that treats previously generated frames as external memory and leverages them as contextual conditioning to achieve controllable camera viewpoints with high scene consistency. To enable longer and more relevant context, we design a context compression module that encodes memory frames into compact representations and employs co-visibility-based selection to dynamically retrieve the most relevant historical frames, thereby reducing computational overhead while enriching contextual information. Experiments on interactive video generation tasks show that MemCam significantly outperforms existing baseline methods as well as open-source state-of-the-art approaches in terms of scene consistency, particularly in long video scenarios with large camera rotations.
Problem

Research questions and friction points this paper is trying to address.

video generation
scene consistency
camera control
contextual information
interactive video
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory-augmented video generation
camera control
scene consistency
context compression
co-visibility-based selection
🔎 Similar Papers
No similar papers found.
X
Xinhang Gao
Guilin University of Electronic Technology, China
J
Junlin Guan
Guilin University of Electronic Technology, China
S
Shuhan Luo
Guilin University of Electronic Technology, China
W
Wenzhuo Li
Guilin University of Electronic Technology, China
G
Guanghuan Tan
Guilin University of Electronic Technology, China
Jiacheng Wang
Jiacheng Wang
Nanyang Technological University
ISACGenAILow-altitude wireless networkSemantic Communications