STD-GS: Exploring Frame-Event Interaction for SpatioTemporal-Disentangled Gaussian Splatting to Reconstruct High-Dynamic Scene

πŸ“… 2025-06-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In highly dynamic scenes, conventional unified spatiotemporal representation models suffer from spatiotemporal mismatch due to intrinsic heterogeneity between static backgrounds and moving objects in both spatial appearance and temporal motion characteristics. Method: This paper proposes a spatiotemporally decoupled Gaussian splatting framework that fuses frame-camera and event-camera data. It introduces, for the first time, event-stream-guided spatiotemporal decoupling of Gaussian representations: leveraging the temporal continuity of events and the consistency prior between events and Gaussian spatiotemporal modeling, it explicitly separates background and object appearance/motion features via clustering and establishes a continuous spatiotemporal modulation mechanism. Contribution/Results: Unlike existing methods, ours avoids temporal discretization and spatial heterogeneity artifacts induced by frame-based imaging. Experiments demonstrate significant improvements in spatiotemporal consistency and dynamic content rendering quality for neural reconstruction in high-motion scenarios.

Technology Category

Application Category

πŸ“ Abstract
High-dynamic scene reconstruction aims to represent static background with rigid spatial features and dynamic objects with deformed continuous spatiotemporal features. Typically, existing methods adopt unified representation model (e.g., Gaussian) to directly match the spatiotemporal features of dynamic scene from frame camera. However, this unified paradigm fails in the potential discontinuous temporal features of objects due to frame imaging and the heterogeneous spatial features between background and objects. To address this issue, we disentangle the spatiotemporal features into various latent representations to alleviate the spatiotemporal mismatching between background and objects. In this work, we introduce event camera to compensate for frame camera, and propose a spatiotemporal-disentangled Gaussian splatting framework for high-dynamic scene reconstruction. As for dynamic scene, we figure out that background and objects have appearance discrepancy in frame-based spatial features and motion discrepancy in event-based temporal features, which motivates us to distinguish the spatiotemporal features between background and objects via clustering. As for dynamic object, we discover that Gaussian representations and event data share the consistent spatiotemporal characteristic, which could serve as a prior to guide the spatiotemporal disentanglement of object Gaussians. Within Gaussian splatting framework, the cumulative scene-object disentanglement can improve the spatiotemporal discrimination between background and objects to render the time-continuous dynamic scene. Extensive experiments have been performed to verify the superiority of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

Reconstruct high-dynamic scenes with disentangled spatiotemporal features
Address spatiotemporal mismatching between background and dynamic objects
Combine frame and event cameras for improved scene representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangles spatiotemporal features via latent representations
Integrates event camera to compensate frame camera
Uses clustering to distinguish scene-object features
πŸ”Ž Similar Papers
No similar papers found.