4D-MoDe: Towards Editable and Scalable Volumetric Streaming via Motion-Decoupled 4D Gaussian Compression

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the challenges of large data volume, complex motion modeling, and poor editability in dynamic volumetric video streaming, this paper proposes a hierarchical motion-decoupled 4D Gaussian representation. The scene is decomposed into static background and dynamic foreground components; temporal consistency is ensured via adaptive keyframe insertion and independent foreground streaming encoding. A multi-resolution motion estimation grid, lightweight shared MLP, entropy-aware training, and joint range coding with KD-tree compression are integrated to achieve rate-distortion optimization. Evaluated on multiple standard benchmarks, the method achieves an average storage cost of only 11.4 KB per frame while matching state-of-the-art reconstruction quality. Crucially, it enables fine-grained editing operations—such as background replacement—thereby significantly enhancing volumetric video editability, scalability, and streaming efficiency.

Technology Category

Application Category

📝 Abstract

Volumetric video has emerged as a key medium for immersive telepresence and augmented/virtual reality, enabling six-degrees-of-freedom (6DoF) navigation and realistic spatial interactions. However, delivering high-quality dynamic volumetric content at scale remains challenging due to massive data volume, complex motion, and limited editability of existing representations. In this paper, we present 4D-MoDe, a motion-decoupled 4D Gaussian compression framework designed for scalable and editable volumetric video streaming. Our method introduces a layered representation that explicitly separates static backgrounds from dynamic foregrounds using a lookahead-based motion decomposition strategy, significantly reducing temporal redundancy and enabling selective background/foreground streaming. To capture continuous motion trajectories, we employ a multi-resolution motion estimation grid and a lightweight shared MLP, complemented by a dynamic Gaussian compensation mechanism to model emergent content. An adaptive grouping scheme dynamically inserts background keyframes to balance temporal consistency and compression efficiency. Furthermore, an entropy-aware training pipeline jointly optimizes the motion fields and Gaussian parameters under a rate-distortion (RD) objective, while employing range-based and KD-tree compression to minimize storage overhead. Extensive experiments on multiple datasets demonstrate that 4D-MoDe consistently achieves competitive reconstruction quality with an order of magnitude lower storage cost (e.g., as low as extbf{11.4} KB/frame) compared to state-of-the-art methods, while supporting practical applications such as background replacement and foreground-only streaming.

Problem

Research questions and friction points this paper is trying to address.

Delivering high-quality dynamic volumetric content at scale remains challenging

Existing representations suffer from massive data volume and limited editability

Complex motion patterns create temporal redundancy in volumetric streaming

Innovation

Methods, ideas, or system contributions that make the work stand out.

Motion-decoupled 4D Gaussian compression separates static and dynamic content

Multi-resolution motion estimation with dynamic Gaussian compensation mechanism

Entropy-aware training optimizes motion fields and Gaussian parameters jointly

🔎 Similar Papers

No similar papers found.