🤖 AI Summary
Monocular drone videos in large-scale dynamic scenes suffer from depth ambiguity and unstable motion estimation, leading to ill-posed 4D reconstruction. To address this, this work proposes the AeroDGS framework, which jointly reconstructs static backgrounds and dynamic objects through a monocular geometry enhancement module and, for the first time, integrates differentiable physical priors—ground support, upright stability, and trajectory smoothness—into the optimization of dynamic 3D Gaussian splatting to achieve physically consistent and temporally coherent reconstructions. Evaluated on both synthetic and real-world aerial datasets, the method significantly outperforms existing approaches in 4D reconstruction fidelity and introduces a new benchmark with multi-altitude and multi-motion real-world scenarios for comprehensive evaluation.
📝 Abstract
Recent advances in 4D scene reconstruction have significantly improved dynamic modeling across various domains. However, existing approaches remain limited under aerial conditions with single-view capture, wide spatial range, and dynamic objects of limited spatial footprint and large motion disparity. These challenges cause severe depth ambiguity and unstable motion estimation, making monocular aerial reconstruction inherently ill-posed. To this end, we present AeroDGS, a physics-guided 4D Gaussian splatting framework for monocular UAV videos. AeroDGS introduces a Monocular Geometry Lifting module that reconstructs reliable static and dynamic geometry from a single aerial sequence, providing a robust basis for dynamic estimation. To further resolve monocular ambiguity, we propose a Physics-Guided Optimization module that incorporates differentiable ground-support, upright-stability, and trajectory-smoothness priors, transforming ambiguous image cues into physically consistent motion. The framework jointly refines static backgrounds and dynamic entities with stable geometry and coherent temporal evolution. We additionally build a real-world UAV dataset that spans various altitudes and motion conditions to evaluate dynamic aerial reconstruction. Experiments on synthetic and real UAV scenes demonstrate that AeroDGS outperforms state-of-the-art methods, achieving superior reconstruction fidelity in dynamic aerial environments.