VS3R: Robust Full-frame Video Stabilization via Deep 3D Reconstruction

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental trade-off in video stabilization between geometric robustness and full-frame consistency: conventional 2D methods rely on aggressive cropping, while 3D approaches suffer from unstable optimization under large camera motions. To overcome these limitations, we propose a novel framework that integrates feedforward deep 3D reconstruction with a dual-stream generative video diffusion model. Our approach jointly estimates camera parameters, depth, and dynamic masks, and introduces a hybrid stabilization rendering mechanism that synergistically combines structure-guided warping with semantic anchors to enable effective occlusion inpainting and artifact correction. For the first time, this method achieves high-fidelity, full-frame stabilized video output across diverse camera models, significantly outperforming existing techniques in extreme motion scenarios and setting a new state of the art in both robustness and visual quality.

Technology Category

Application Category

📝 Abstract
Video stabilization aims to mitigate camera shake but faces a fundamental trade-off between geometric robustness and full-frame consistency. While 2D methods suffer from aggressive cropping, 3D techniques are often undermined by fragile optimization pipelines that fail under extreme motions. To bridge this gap, we propose VS3R, a framework that synergizes feed-forward 3D reconstruction with generative video diffusion. Our pipeline jointly estimates camera parameters, depth, and masks to ensure all-scenario reliability, and introduces a Hybrid Stabilized Rendering module that fuses semantic and geometric cues for dynamic consistency. Finally, a Dual-Stream Video Diffusion Model restores disoccluded regions and rectifies artifacts by synergizing structural guidance with semantic anchors. Collectively, VS3R achieves high-fidelity, full-frame stabilization across diverse camera models and significantly outperforms state-of-the-art methods in robustness and visual quality.
Problem

Research questions and friction points this paper is trying to address.

video stabilization
camera shake
full-frame consistency
3D reconstruction
geometric robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

video stabilization
3D reconstruction
video diffusion
full-frame consistency
disocclusion inpainting
🔎 Similar Papers
No similar papers found.
M
Muhua Zhu
Hunan University, Changsha, Hunan 410082, China
X
Xinhao Jin
Hunan University, Changsha, Hunan 410082, China
Yu Zhang
Yu Zhang
University of Science and Technology of China
Efficient AI SystemsProgramming SystemsProgram AnalysisMulti-Modal Perception
Y
Yifei Xue
Hunan University, Changsha, Hunan 410082, China
T
Tie Ji
Hunan University, Changsha, Hunan 410082, China
Yizhen Lao
Yizhen Lao
Professor, School of Design, Hunan University
computer visioncomputational imagingmachine learning