🤖 AI Summary
Existing 4D content generation methods rely heavily on skeletal priors, limiting motion diversity and impairing spatiotemporal coherence. To address this, we propose a skeleton-free implicit 4D joint representation framework. Our approach constructs an integrated low-dimensional latent space that jointly encodes geometry, appearance, and motion, enabling fine-grained per-frame modeling. We further introduce a temporal-aware diffusion model operating in this latent space to enforce inter-frame consistency. The framework supports high-fidelity 4D mesh animation generation conditioned on multimodal inputs—either images or text. Key innovations include: (i) the first skeleton-free implicit 4D joint representation; (ii) an integrated latent-space mapping mechanism; and (iii) a low-dimensional temporal diffusion generation paradigm. Extensive experiments on ShapeNet, 3DBiCar, and DeformingThings4D demonstrate significant improvements over state-of-the-art methods, producing photorealistic colored 3D shapes and temporally coherent 4D animations.
📝 Abstract
Directly learning to model 4D content, including shape, color and motion, is challenging. Existing methods depend on skeleton-based motion control and offer limited continuity in detail. To address this, we propose a novel framework that generates coherent 4D sequences with animation of 3D shapes under given conditions with dynamic evolution of shape and color over time through integrative latent mapping. We first employ an integrative latent unified representation to encode shape and color information of each detailed 3D geometry frame. The proposed skeleton-free latent 4D sequence joint representation allows us to leverage diffusion models in a low-dimensional space to control the generation of 4D sequences. Finally, temporally coherent 4D sequences are generated conforming well to the input images and text prompts. Extensive experiments on the ShapeNet, 3DBiCar and DeformingThings4D datasets for several tasks demonstrate that our method effectively learns to generate quality 3D shapes with color and 4D mesh animations, improving over the current state-of-the-art. Source code will be released.