Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos

📅 2025-01-25

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

To address the challenges of unknown camera poses, complex motion trajectories, and variable-length sequences in long-video 3D reconstruction, this paper proposes the first end-to-end joint optimization framework tailored for 3D Gaussian Splatting (3DGS). Methodologically: (1) it introduces a progressive joint optimization mechanism that simultaneously refines camera poses and 3D Gaussian parameters; (2) it incorporates a neighboring-frame pose propagation strategy to enforce temporal consistency; and (3) it designs an adaptive segmentation-based optimization scheme to efficiently handle ultra-long videos. Evaluated on Tanks and Temples and a newly constructed real-world long-video dataset, our method significantly outperforms existing approaches in reconstruction fidelity, robustness, and generalization. It achieves, for the first time, stable 3DGS reconstruction from arbitrarily long videos without requiring initial pose estimates—marking a breakthrough in scalable, pose-free 3D reconstruction from long video sequences.

Technology Category

Application Category

📝 Abstract

3D Gaussian Splatting (3DGS) has emerged as a powerful representation due to its efficiency and high-fidelity rendering. However, 3DGS training requires a known camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Pioneering works have attempted to relax this restriction but still face difficulties when handling long sequences with complex camera trajectories. In this work, we propose Rob-GS, a robust framework to progressively estimate camera poses and optimize 3DGS for arbitrarily long video sequences. Leveraging the inherent continuity of videos, we design an adjacent pose tracking method to ensure stable pose estimation between consecutive frames. To handle arbitrarily long inputs, we adopt a"divide and conquer"scheme that adaptively splits the video sequence into several segments and optimizes them separately. Extensive experiments on the Tanks and Temples dataset and our collected real-world dataset show that our Rob-GS outperforms the state-of-the-arts.

Problem

Research questions and friction points this paper is trying to address.

3D Gaussian Mapping

Camera Position Independence

Stable Pose and 3D Structure Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rob-GS

Camera Position Estimation

3D Gaussian Maps Optimization

🔎 Similar Papers

STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video