Orthogonal Spatial-temporal Distributional Transfer for 4D Generation

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality 4D content generation is hindered by the scarcity of large-scale 4D datasets, which limits the ability to adequately model spatiotemporal characteristics. To address this challenge, this work proposes STD-4D, a diffusion model that decouples spatial and temporal modeling by transferring spatial priors from 3D diffusion models and temporal priors from video diffusion models. The approach introduces two key innovations: an orthogonal spatiotemporal distribution transfer mechanism (Orster) to effectively align heterogeneous priors, and a spatiotemporal-aware ST-HexPlane representation for efficient fusion of spatial and temporal features. Experimental results demonstrate that the proposed method significantly outperforms existing approaches in both 4D generation quality and spatiotemporal consistency, effectively overcoming the performance bottleneck imposed by data scarcity.

Technology Category

Application Category

📝 Abstract
In the AIGC era, generating high-quality 4D content has garnered increasing research attention. Unfortunately, current 4D synthesis research is severely constrained by the lack of large-scale 4D datasets, preventing models from adequately learning the critical spatial-temporal features necessary for high-quality 4D generation, thus hindering progress in this domain. To combat this, we propose a novel framework that transfers rich spatial priors from existing 3D diffusion models and temporal priors from video diffusion models to enhance 4D synthesis. We develop a spatial-temporal-disentangled 4D (STD-4D) Diffusion model, which synthesizes 4D-aware videos through disentangled spatial and temporal latents. To facilitate the best feature transfer, we design a novel Orthogonal Spatial-temporal Distributional Transfer (Orster) mechanism, where the spatiotemporal feature distributions are carefully modeled and injected into the STD-4D Diffusion. Furthermore, during the 4D construction, we devise a spatial-temporal-aware HexPlane (ST-HexPlane) to integrate the transferred spatiotemporal features, thereby improving 4D deformation and 4D Gaussian feature modeling. Experiments demonstrate that our method significantly outperforms existing approaches, achieving superior spatial-temporal consistency and higher-quality 4D synthesis.
Problem

Research questions and friction points this paper is trying to address.

4D generation
spatial-temporal features
data scarcity
AIGC
4D synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Spatial-temporal Transfer
4D Generation
Diffusion Models
Spatiotemporal Disentanglement
ST-HexPlane
🔎 Similar Papers
No similar papers found.