VideoMatGen: PBR Materials through Joint Generative Modeling

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of automatically generating physically plausible, multi-channel PBR (Physically Based Rendering) materials from textual descriptions and 3D geometric information. To this end, it introduces the first video diffusion Transformer model capable of jointly synthesizing multiple material attributes—including base color, roughness, metallic, and height maps—and incorporates a compact variational autoencoder that maps multimodal material data into a unified latent space, thereby effectively controlling computational overhead. The proposed method produces high-quality, tool-compatible PBR materials that preserve physical realism while significantly accelerating the creation pipeline for 3D assets.

Technology Category

Application Category

📝 Abstract

We present a method for generating physically-based materials for 3D shapes based on a video diffusion transformer architecture. Our method is conditioned on input geometry and a text description, and jointly models multiple material properties (base color, roughness, metallicity, height map) to form physically plausible materials. We further introduce a custom variational auto-encoder which encodes multiple material modalities into a compact latent space, which enables joint generation of multiple modalities without increasing the number of tokens. Our pipeline generates high-quality materials for 3D shapes given a text prompt, compatible with common content creation tools.

Problem

Research questions and friction points this paper is trying to address.

PBR materials

3D shape

text-to-material

material generation

physically-based rendering

Innovation

Methods, ideas, or system contributions that make the work stand out.

video diffusion transformer

physically-based rendering

joint generative modeling