🤖 AI Summary
To address pose estimation bias and accumulated 3D reconstruction errors caused by motion blur in monocular dynamic videos, this paper proposes the first end-to-end joint optimization framework that embeds camera poses as learnable parameters into 3D Gaussian splatting. Grounded in SE(3) motion modeling, our method employs a three-stage alternating optimization strategy: (i) optimizing Gaussian attributes with fixed poses, (ii) optimizing poses with fixed Gaussians, and (iii) jointly refining both. This unified scheme effectively mitigates error propagation under blur degradation. Evaluated on the Stereo Blur dataset and challenging real-world dynamic sequences, our approach achieves significant improvements over state-of-the-art dynamic deblurring methods in both reconstruction quality (PSNR/SSIM) and pose accuracy (ATE). Results demonstrate the efficacy and generalizability of co-optimizing camera poses and scene geometry within a differentiable 3D Gaussian representation.
📝 Abstract
Reconstructing dynamic 3D scenes from monocular video has broad applications in AR/VR, robotics, and autonomous navigation, but often fails due to severe motion blur caused by camera and object motion. Existing methods commonly follow a two-step pipeline, where camera poses are first estimated and then 3D Gaussians are optimized. Since blurring artifacts usually undermine pose estimation, pose errors could be accumulated to produce inferior reconstruction results. To address this issue, we introduce a unified optimization framework by incorporating camera poses as learnable parameters complementary to 3DGS attributes for end-to-end optimization. Specifically, we recast camera and object motion as per-primitive SE(3) affine transformations on 3D Gaussians and formulate a unified optimization objective. For stable optimization, we introduce a three-stage training schedule that optimizes camera poses and Gaussians alternatively. Particularly, 3D Gaussians are first trained with poses being fixed, and then poses are optimized with 3D Gaussians being untouched. Finally, all learnable parameters are optimized together. Extensive experiments on the Stereo Blur dataset and challenging real-world sequences demonstrate that our method achieves significant gains in reconstruction quality and pose estimation accuracy over prior dynamic deblurring methods.