🤖 AI Summary
Existing PBR material generation methods rely on hand-crafted designs and lack a unified representation, hindering coherent modeling of the relationship between RGB appearance and underlying physical properties—leading to task fragmentation and inability to leverage large-scale RGB image data. This work proposes the first joint RGB-PBR representation framework, encoding appearance and physical attributes as dual latent variable sequences, and models them via a structured 5-frame video diffusion architecture to support text/image-to-material generation and intrinsic decomposition. Trained on our newly constructed hybrid dataset MatHybrid-410K—which integrates massive RGB images with high-fidelity PBR data—the model generates materials at native 1024×1024 resolution. It significantly outperforms prior methods in fidelity, diversity, and multi-task generalization, establishing the first通用 foundation model for industrial-grade material generation.
📝 Abstract
Physically-based rendering (PBR) materials are fundamental to photorealistic graphics, yet their creation remains labor-intensive and requires specialized expertise. While generative models have advanced material synthesis, existing methods lack a unified representation bridging natural image appearance and PBR properties, leading to fragmented task-specific pipelines and inability to leverage large-scale RGB image data. We present MatPedia, a foundation model built upon a novel joint RGB-PBR representation that compactly encodes materials into two interdependent latents: one for RGB appearance and one for the four PBR maps encoding complementary physical properties. By formulating them as a 5-frame sequence and employing video diffusion architectures, MatPedia naturally captures their correlations while transferring visual priors from RGB generation models. This joint representation enables a unified framework handling multiple material tasks--text-to-material generation, image-to-material generation, and intrinsic decomposition--within a single architecture. Trained on MatHybrid-410K, a mixed corpus combining PBR datasets with large-scale RGB images, MatPedia achieves native $1024 imes1024$ synthesis that substantially surpasses existing approaches in both quality and diversity.