π€ AI Summary
Reconstructing dynamic 3D Gaussian Splatting (3DGS) from low-frame-rate RGB video and asynchronous event streams is challenging due to motion-induced solution-space ambiguity, color-agnostic events, and inherent modality heterogeneity.
Method: We propose a multimodal co-optimization framework that (i) leverages LoCM-based unsupervised fine-tuning to extract high-frequency motion priors from events and establish event-Gaussian motion correspondence; (ii) bridges RGBβevent modality gaps via geometry-aware data association, explicit motion decomposition, and cross-frame pseudo-labeling.
Contribution/Results: To our knowledge, this is the first work to employ event-driven deformation fields for guiding 3DGS optimization, significantly improving reconstruction accuracy and robustness in dynamic scenes. Extensive experiments on both synthetic and real-world datasets demonstrate superior performance over state-of-the-art single-modality (RGB-only or event-only) and multimodal fusion approaches, validating the efficacy of event-derived motion priors for dynamic 3DGS training.
π Abstract
Reconstructing Dynamic 3D Gaussian Splatting (3DGS) from low-framerate RGB videos is challenging. This is because large inter-frame motions will increase the uncertainty of the solution space. For example, one pixel in the first frame might have more choices to reach the corresponding pixel in the second frame. Event cameras can asynchronously capture rapid visual changes and are robust to motion blur, but they do not provide color information. Intuitively, the event stream can provide deterministic constraints for the inter-frame large motion by the event trajectories. Hence, combining low-temporal-resolution images with high-framerate event streams can address this challenge. However, it is challenging to jointly optimize Dynamic 3DGS using both RGB and event modalities due to the significant discrepancy between these two data modalities. This paper introduces a novel framework that jointly optimizes dynamic 3DGS from the two modalities. The key idea is to adopt event motion priors to guide the optimization of the deformation fields. First, we extract the motion priors encoded in event streams by using the proposed LoCM unsupervised fine-tuning framework to adapt an event flow estimator to a certain unseen scene. Then, we present the geometry-aware data association method to build the event-Gaussian motion correspondence, which is the primary foundation of the pipeline, accompanied by two useful strategies, namely motion decomposition and inter-frame pseudo-label. Extensive experiments show that our method outperforms existing image and event-based approaches across synthetic and real scenes and prove that our method can effectively optimize dynamic 3DGS with the help of event data.