🤖 AI Summary
This work addresses the problem of reconstructing high-fidelity PBR materials—albedo, roughness, and metallic maps—from a single image or text prompt, without 3D supervision. The method fine-tunes a video diffusion model to generate multi-view videos with consistent geometry and lighting; applies video intrinsic decomposition to separate static geometry from dynamic illumination; and jointly optimizes material parameters via physics-based differentiable path tracing. To our knowledge, this is the first approach coupling video diffusion models with differentiable physically based rendering to enable direct 2D-to-PBR mapping. Experiments demonstrate significant improvements over state-of-the-art image-to-material methods on complex real-world materials. The resulting PBR textures are engine-ready—seamlessly importable into Blender, Unity, and similar platforms—and support real-time editing and relighting. This framework establishes a new, efficient, and controllable paradigm for material acquisition in content creation.
📝 Abstract
We leverage finetuned video diffusion models, intrinsic decomposition of videos, and physically-based differentiable rendering to generate high quality materials for 3D models given a text prompt or a single image. We condition a video diffusion model to respect the input geometry and lighting condition. This model produces multiple views of a given 3D model with coherent material properties. Secondly, we use a recent model to extract intrinsics (base color, roughness, metallic) from the generated video. Finally, we use the intrinsics alongside the generated video in a differentiable path tracer to robustly extract PBR materials directly compatible with common content creation tools.