🤖 AI Summary
Existing 3D generative methods suffer from data scarcity and challenges in modeling multi-channel physically based rendering (PBR) textures, resulting in physically inconsistent and low-fidelity outputs. To address this, we propose a multi-view, multi-channel diffusion-based generation framework that jointly models shaded and albedo channels—enabling intrinsic image decomposition for the first time. We further introduce a multimodal large-model-driven agent module for intelligent post-processing, emulating professional artists’ material evaluation and refinement logic. Our approach unifies texture disentangled generation, physics-aware consistency constraints, and semantically controllable editing. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches in visual quality, material realism, and cross-channel consistency. Notably, it achieves the first end-to-end PBR texture generation pipeline producing high-fidelity, physically interpretable results.
📝 Abstract
Current methods for 3D generation still fall short in physically based rendering (PBR) texturing, primarily due to limited data and challenges in modeling multi-channel materials. In this work, we propose MuMA, a method for 3D PBR texturing through Multi-channel Multi-view generation and Agentic post-processing. Our approach features two key innovations: 1) We opt to model shaded and albedo appearance channels, where the shaded channels enables the integration intrinsic decomposition modules for material properties. 2) Leveraging multimodal large language models, we emulate artists' techniques for material assessment and selection. Experiments demonstrate that MuMA achieves superior results in visual quality and material fidelity compared to existing methods.