CoCo4D: Comprehensive and Complex 4D Scene Generation

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 4D synthesis methods are largely restricted to object-level generation or exhibit weak generalization across novel viewpoints, hindering the creation of multi-view-consistent and immersive dynamic 4D scenes. To address this, we propose a text- (optionally image-) driven disentangled 4D generation framework that separately models dynamic foreground motion and background evolution. Specifically, we leverage video diffusion models to generate reference action sequences for the foreground and employ optimizable parametric trajectories to ensure natural foreground integration. Background and foreground are synthesized independently via progressive outpainting and then jointly rendered in a coordinated manner. Our approach significantly improves spatiotemporal consistency and geometric fidelity. Quantitative and qualitative evaluations demonstrate superior performance over state-of-the-art methods in terms of multi-view dynamic scene quality, detail richness, generalization capability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Existing 4D synthesis methods primarily focus on object-level generation or dynamic scene synthesis with limited novel views, restricting their ability to generate multi-view consistent and immersive dynamic 4D scenes. To address these constraints, we propose a framework (dubbed as CoCo4D) for generating detailed dynamic 4D scenes from text prompts, with the option to include images. Our method leverages the crucial observation that articulated motion typically characterizes foreground objects, whereas background alterations are less pronounced. Consequently, CoCo4D divides 4D scene synthesis into two responsibilities: modeling the dynamic foreground and creating the evolving background, both directed by a reference motion sequence. Given a text prompt and an optional reference image, CoCo4D first generates an initial motion sequence utilizing video diffusion models. This motion sequence then guides the synthesis of both the dynamic foreground object and the background using a novel progressive outpainting scheme. To ensure seamless integration of the moving foreground object within the dynamic background, CoCo4D optimizes a parametric trajectory for the foreground, resulting in realistic and coherent blending. Extensive experiments show that CoCo4D achieves comparable or superior performance in 4D scene generation compared to existing methods, demonstrating its effectiveness and efficiency. More results are presented on our website https://colezwhy.github.io/coco4d/.
Problem

Research questions and friction points this paper is trying to address.

Generates multi-view consistent 4D dynamic scenes
Separates foreground and background motion synthesis
Ensures realistic blending of dynamic elements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Divides 4D scene synthesis into dynamic foreground and evolving background
Uses video diffusion models for initial motion sequence generation
Optimizes parametric trajectory for seamless foreground-background blending
🔎 Similar Papers
No similar papers found.