Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address geometric drift and temporal inconsistency in 4D Gaussian Splatting (4DGS) reconstruction for dynamic endoscopic scenes, this paper proposes a jointly trained geometrically guided and temporally aware framework. Methodologically, it introduces confidence-gated monocular depth distillation as a strong geometric prior; designs XYZT spatiotemporal embeddings and screw-motion parameterization to jointly model rigid and non-rigid deformations; and integrates keyframe constraints with streaming optimization, alongside scale-invariant depth loss, depth gradient regularization, and soft warm-up scheduling. Evaluated on EndoNeRF and StereoMIS-P1, the method achieves state-of-the-art monocular 4D reconstruction performance—significantly improving geometric accuracy and long-sequence temporal coherence while maintaining efficient inference.

Technology Category

Application Category

📝 Abstract

Endoscopic (endo) video exhibits strong view-dependent effects such as specularities, wet reflections, and occlusions. Pure photometric supervision misaligns with geometry and triggers early geometric drift, where erroneous shapes are reinforced during densification and become hard to correct. We ask how to anchor geometry early for 4D Gaussian splatting (4DGS) while maintaining temporal consistency and efficiency in dynamic endoscopic scenes. Thus, we present Endo-G$^{2}$T, a geometry-guided and temporally aware training scheme for time-embedded 4DGS. First, geo-guided prior distillation converts confidence-gated monocular depth into supervision with scale-invariant depth and depth-gradient losses, using a warm-up-to-cap schedule to inject priors softly and avoid early overfitting. Second, a time-embedded Gaussian field represents dynamics in XYZT with a rotor-like rotation parameterization, yielding temporally coherent geometry with lightweight regularization that favors smooth motion and crisp opacity boundaries. Third, keyframe-constrained streaming improves efficiency and long-horizon stability through keyframe-focused optimization under a max-points budget, while non-keyframes advance with lightweight updates. Across EndoNeRF and StereoMIS-P1 datasets, Endo-G$^{2}$T achieves state-of-the-art results among monocular reconstruction baselines.

Problem

Research questions and friction points this paper is trying to address.

Addressing geometric drift in endoscopic 4D reconstruction from monocular videos

Maintaining temporal consistency in dynamic scenes with view-dependent effects

Improving reconstruction efficiency through keyframe-constrained streaming optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided depth distillation with scale-invariant losses

Time-embedded Gaussian field with rotor-like parameterization

Keyframe-constrained streaming under max-points budget

🔎 Similar Papers

No similar papers found.