Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
High-fidelity dynamic 4D generation is hindered by temporal artifacts, substantial computational costs, and scarcity of training data. This work addresses these challenges by introducing a temporally decaying sparse attention mechanism built upon the pretrained 3D diffusion Transformer Hunyuan3D 2.1. The approach anchors object identity using an initial frame and models spatiotemporal dynamics through block-sparse attention combined with a time-decay masking strategy. By doing so, it significantly reduces computational overhead by 56% while preserving identity consistency, achieving state-of-the-art performance in temporal coherence and generation quality for 4D content. This advancement overcomes critical bottlenecks in data and compute requirements, paving the way for efficient and scalable 4D generation.

Technology Category

Application Category

📝 Abstract
Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in temporally coherent 4D synthesis and charts a path toward efficient and scalable 4D generation.
Problem

Research questions and friction points this paper is trying to address.

4D generation
temporal artifacts
computational demand
spatiotemporal dependencies
dynamic shape synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D generation
Sparse Attention
Diffusion Transformer
Temporal Coherence
Spatiotemporal Modeling
🔎 Similar Papers
2024-03-18European Conference on Computer VisionCitations: 70