ShoulderShot: Generating Over-the-Shoulder Dialogue Videos

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in video generation for over-the-shoulder dialogue scenes: character inconsistency, spatial discontinuity, and high computational cost in long-duration, multi-turn generation. To this end, we propose a dual-camera collaborative generation framework coupled with a cyclic video expansion strategy. Methodologically, our approach integrates diffusion-model-driven dual-view video synthesis, a temporal cyclic architecture, and pose-consistency constraints to enforce inter-frame character identity and spatial coherence. To the best of our knowledge, this is the first method to achieve stable shot-reverse-shot composition and natural spatial continuity in generative video. Experiments demonstrate significant improvements in shot structural plausibility, cross-frame character consistency, and spatial coherence, while enabling arbitrary-length dialogue generation without compromising visual quality. Our method outperforms existing approaches and is directly applicable to practical scenarios such as film previsualization and intelligent advertising.

Technology Category

Application Category

📝 Abstract
Over-the-shoulder dialogue videos are essential in films, short dramas, and advertisements, providing visual variety and enhancing viewers' emotional connection. Despite their importance, such dialogue scenes remain largely underexplored in video generation research. The main challenges include maintaining character consistency across different shots, creating a sense of spatial continuity, and generating long, multi-turn dialogues within limited computational budgets. Here, we present ShoulderShot, a framework that combines dual-shot generation with looping video, enabling extended dialogues while preserving character consistency. Our results demonstrate capabilities that surpass existing methods in terms of shot-reverse-shot layout, spatial continuity, and flexibility in dialogue length, thereby opening up new possibilities for practical dialogue video generation. Videos and comparisons are available at https://shouldershot.github.io.
Problem

Research questions and friction points this paper is trying to address.

Generating consistent character videos in dialogue scenes
Maintaining spatial continuity across over-the-shoulder shots
Producing long multi-turn dialogues with limited resources
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-shot generation for character consistency
Looping video for extended dialogues
Computationally efficient multi-turn dialogue generation
🔎 Similar Papers
No similar papers found.
Yuang Zhang
Yuang Zhang
Shanghai Jiao Tong University
J
Junqi Cheng
Tencent
H
Haoyu Zhao
Tencent
Jiaxi Gu
Jiaxi Gu
Huawei Noah's Ark Lab
vision-language pre-trainingmultimodal learninggenerative models
F
Fangyuan Zou
Tencent
Z
Zenghui Lu
Tencent
P
Peng Shu
Tencent