π€ AI Summary
To address the scarcity of PBR texture data, distribution shift caused by freezing embedding networks in existing methods, and critical challenges including cross-view inconsistency and misalignment between latent and pixel spaces, this paper proposes MatLatβa material-aware latent space framework. MatLat fine-tunes a pre-trained VAE to construct a multi-channel material encoder and jointly optimizes the diffusion-based generation process. It introduces correspondence-aware perceptual attention and a locality-preserving patch alignment regularization in latent space, explicitly enforcing cross-view consistency and mitigating distribution shift. Experiments demonstrate that MatLat significantly outperforms state-of-the-art methods in PBR texture fidelity, achieving consistent improvements in SSIM, LPIPS, and material physical plausibility metrics. Ablation studies validate the essential contribution of each component to overall performance.
π Abstract
We propose a generative framework for producing high-quality PBR textures on a given 3D mesh. As large-scale PBR texture datasets are scarce, our approach focuses on effectively leveraging the embedding space and diffusion priors of pretrained latent image generative models while learning a material latent space, MatLat, through targeted fine-tuning. Unlike prior methods that freeze the embedding network and thus lead to distribution shifts when encoding additional PBR channels and hinder subsequent diffusion training, we fine-tune the pretrained VAE so that new material channels can be incorporated with minimal latent distribution deviation. We further show that correspondence-aware attention alone is insufficient for cross-view consistency unless the latent-to-image mapping preserves locality. To enforce this locality, we introduce a regularization in the VAE fine-tuning that crops latent patches, decodes them, and aligns the corresponding image regions to maintain strong pixel-latent spatial correspondence. Ablation studies and comparison with previous baselines demonstrate that our framework improves PBR texture fidelity and that each component is critical for achieving state-of-the-art performance.