🤖 AI Summary
This work addresses the challenge of generating dynamic 4D intrinsic assets—time-varying geometry, reflectance, and texture—without manual 3D modeling or animation. Methodologically, it introduces a learnable neural template to encode temporal latent states, enforces inter-frame consistency via self-supervised image features, and achieves end-to-end optimization through temporal-guided distillation from pretrained 2D diffusion models and differentiable rendering. To our knowledge, this is the first approach enabling controllable generation of dynamic 4D intrinsic properties and high-fidelity, novel-view, arbitrary-illumination, and time-step-specific rendering. Experiments demonstrate that the method substantially lowers the barrier to dynamic 3D content creation, producing temporally coherent, physically plausible, and high-fidelity intrinsic sequences across diverse natural phenomena—including blossoming roses and other complex evolutions.
📝 Abstract
We study the problem of generating temporal object intrinsics -- temporally evolving sequences of object geometry, reflectance, and texture, such as a blooming rose -- from pre-trained 2D foundation models. Unlike conventional 3D modeling and animation techniques that require extensive manual effort and expertise, we introduce a method that generates such assets with signals distilled from pre-trained 2D diffusion models. To ensure the temporal consistency of object intrinsics, we propose Neural Templates for temporal-state-guided distillation, derived automatically from image features from self-supervised learning. Our method can generate high-quality temporal object intrinsics for several natural phenomena and enable the sampling and controllable rendering of these dynamic objects from any viewpoint, under any environmental lighting conditions, at any time of their lifespan. Project website: https://chen-geng.com/rose4d