From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular dynamic 3D reconstruction suffers from geometric ambiguity and high computational cost. Existing sparse control strategies rely solely on geometric distribution of control points, leading to redundant sampling in static regions and insufficient coverage in dynamic areas. To address this, we propose a semantic-guided, motion-adaptive control framework: leveraging vision foundation models to extract patch-level semantic and motion priors, we establish a patch-token-node mapping; integrating iterative voxelization with motion-trend scoring enables adaptive allocation of control point density according to local motion complexity; and we replace MLP-based deformation fields with spline-parameterized trajectory modeling, initialized via 2D trajectory estimation for enhanced optimization stability. Evaluated on multiple benchmarks, our method significantly outperforms state-of-the-art approaches, achieving simultaneous improvements in reconstruction accuracy, optimization stability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Dynamic 3D reconstruction from monocular videos remains difficult due to the ambiguity inferring 3D motion from limited views and computational demands of modeling temporally varying scenes. While recent sparse control methods alleviate computation by reducing millions of Gaussians to thousands of control points, they suffer from a critical limitation: they allocate points purely by geometry, leading to static redundancy and dynamic insufficiency. We propose a motion-adaptive framework that aligns control density with motion complexity. Leveraging semantic and motion priors from vision foundation models, we establish patch-token-node correspondences and apply motion-adaptive compression to concentrate control points in dynamic regions while suppressing redundancy in static backgrounds. Our approach achieves flexible representational density adaptation through iterative voxelization and motion tendency scoring, directly addressing the fundamental mismatch between control point allocation and motion complexity. To capture temporal evolution, we introduce spline-based trajectory parameterization initialized by 2D tracklets, replacing MLP-based deformation fields to achieve smoother motion representation and more stable optimization. Extensive experiments demonstrate significant improvements in reconstruction quality and efficiency over existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Dynamic 3D reconstruction from monocular videos remains difficult
Control point allocation mismatches motion complexity in scenes
Existing methods struggle with smooth temporal motion representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-guided motion control for dynamic 3D Gaussian splatting
Motion-adaptive compression concentrating control points in dynamic regions
Spline-based trajectory parameterization replacing MLP deformation fields
🔎 Similar Papers
No similar papers found.
J
Jianing Chen
Institute of Computing Technology, Chinese Academy of Sciences, ICT
Zehao Li
Zehao Li
Peking University
Operations researchStochastic approximation
Yujun Cai
Yujun Cai
NTU → Meta → Lecturer(Assistant Professor) @UQ
Multi-Modal PerceptionVision-Language Models
H
Hao Jiang
Institute of Computing Technology, Chinese Academy of Sciences, ICT
S
Shuqin Gao
Institute of Computing Technology, Chinese Academy of Sciences, ICT
H
Honglong Zhao
Institute of Computing Technology, Chinese Academy of Sciences, ICT
T
Tianlu Mao
Institute of Computing Technology, Chinese Academy of Sciences, ICT
Yucheng Zhang
Yucheng Zhang
Purdue University
Knowledge GraphLarge Language Models