WorldTree: Towards 4D Dynamic Worlds from Monocular Video using Tree-Chains

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for dynamic scene reconstruction from monocular videos lack a unified spatiotemporal decomposition framework, leading to inefficient coupling between temporal optimization and spatial modeling. To address this limitation, this work proposes WorldTree, a novel framework that introduces a Temporal Partitioning Tree (TPT) for coarse-to-fine hierarchical temporal decomposition and designs a Spatial Ancestry Chain (SAC) to recursively model spatial structural inheritance. By jointly learning complementary spatial dynamics and motion-specific representations, WorldTree achieves decoupled yet unified 4D reconstruction. Experimental results on the NVIDIA-LS and DyCheck benchmarks demonstrate consistent improvements over state-of-the-art methods, with LPIPS metrics enhanced by 8.26% and 9.09%, respectively.

Technology Category

Application Category

📝 Abstract
Dynamic reconstruction has achieved remarkable progress, but there remain challenges in monocular input for more practical applications. The prevailing works attempt to construct efficient motion representations, but lack a unified spatiotemporal decomposition framework, suffering from either holistic temporal optimization or coupled hierarchical spatial composition. To this end, we propose WorldTree, a unified framework comprising Temporal Partition Tree (TPT) that enables coarse-to-fine optimization based on the inheritance-based partition tree structure for hierarchical temporal decomposition, and Spatial Ancestral Chains (SAC) that recursively query ancestral hierarchical structure to provide complementary spatial dynamics while specializing motion representations across ancestral nodes. Experimental results on different datasets indicate that our proposed method achieves 8.26% improvement of LPIPS on NVIDIA-LS and 9.09% improvement of mLPIPS on DyCheck compared to the second-best method. Code: https://github.com/iCVTEAM/WorldTree.
Problem

Research questions and friction points this paper is trying to address.

dynamic reconstruction
monocular video
spatiotemporal decomposition
temporal optimization
hierarchical spatial composition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Partition Tree
Spatial Ancestral Chains
4D Dynamic Reconstruction
Monocular Video
Hierarchical Spatiotemporal Decomposition
Q
Qisen Wang
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering & Qingdao Research Institute, Beihang University
Yifan Zhao
Yifan Zhao
School of Computer Science and Engineering, Beihang University
Computer VisionComputer GraphicsVR/AR
J
Jia Li
State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering & Qingdao Research Institute, Beihang University