MaterialPicker: Multi-Modal Material Generation with Diffusion Transformers

📅 2024-12-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing texture generation methods—driven by text or images—struggle with geometric distortions, oblique viewing angles, and occlusions commonly encountered in real-world photography, hindering high-fidelity virtual environment construction and inverse rendering. To address this, we propose the first generative framework that models texture map sequences as video frames, adapting and fine-tuning a Diffusion Transformer (DiT) video diffusion model for robust, weakly supervised texture reconstruction. Our approach jointly leverages multimodal conditioning (text + arbitrarily distorted multi-view images) and sequential texture representation, enabling semantic guidance and geometric distortion correction. Experiments demonstrate substantial improvements over state-of-the-art methods in texture diversity, structural fidelity, and robustness to geometric distortions. This work establishes a novel paradigm for high-quality, geometry-aware texture generation.

Technology Category

Application Category

📝 Abstract

High-quality material generation is key for virtual environment authoring and inverse rendering. We propose MaterialPicker, a multi-modal material generator leveraging a Diffusion Transformer (DiT) architecture, improving and simplifying the creation of high-quality materials from text prompts and/or photographs. Our method can generate a material based on an image crop of a material sample, even if the captured surface is distorted, viewed at an angle or partially occluded, as is often the case in photographs of natural scenes. We further allow the user to specify a text prompt to provide additional guidance for the generation. We finetune a pre-trained DiT-based video generator into a material generator, where each material map is treated as a frame in a video sequence. We evaluate our approach both quantitatively and qualitatively and show that it enables more diverse material generation and better distortion correction than previous work.

Problem

Research questions and friction points this paper is trying to address.

Generate high-quality materials from text or photos

Correct distortions in material samples from angled views

Enhance material diversity using Diffusion Transformer (DiT)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal DiT-based material generation

Text and image input for guidance

Finetuned video generator for materials

🔎 Similar Papers

StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning