IntrinsiX: High-Quality PBR Generation using Image Priors

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the challenge of generating physically based rendering (PBR) materials from single-sentence text descriptions, targeting high-fidelity, physically consistent intrinsic maps—albedo, roughness, metallic, and normals—for downstream tasks such as relighting, editing, and texture synthesis. The proposed method introduces a cross-intrinsic attention mechanism to enforce semantic alignment and detail coherency across multi-modal material maps. It further incorporates image-prior-driven component-wise pretraining and differentiable PBR-rendering loss optimization to enhance geometric sharpness and physical plausibility. Evaluated on standard benchmarks, the approach significantly outperforms existing intrinsic decomposition and text-to-material methods. It is the first to achieve end-to-end, high-quality, semantically controllable joint generation of full PBR material maps. Empirical validation includes room-scale texture generation and interactive scene editing, demonstrating practical applicability and robustness.

Technology Category

Application Category

📝 Abstract

We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description. In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps. This enables the generated outputs to be used for content creation scenarios in core graphics applications that facilitate re-lighting, editing, and texture generation tasks. In order to train our generator, we exploit strong image priors, and pre-train separate models for each PBR material component (albedo, roughness, metallic, normals). We then align these models with a new cross-intrinsic attention formulation that concatenates key and value features in a consistent fashion. This allows us to exchange information between each output modality and to obtain semantically coherent PBR predictions. To ground each intrinsic component, we propose a rendering loss which provides image-space signals to constrain the model, thus facilitating sharp details also in the output BRDF properties. Our results demonstrate detailed intrinsic generation with strong generalization capabilities that outperforms existing intrinsic image decomposition methods used with generated images by a significant margin. Finally, we show a series of applications, including re-lighting, editing, and text-conditioned room-scale PBR texture generation.

Problem

Research questions and friction points this paper is trying to address.

Generates high-quality PBR maps from text descriptions

Enables re-lighting and editing in graphics applications

Uses cross-intrinsic attention for coherent PBR predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates PBR maps from text descriptions

Uses cross-intrinsic attention for modality alignment

Applies rendering loss for sharp BRDF details

🔎 Similar Papers

StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

2024-06-13arXiv.orgCitations: 1

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Research Scientist Intern, Computer Vision for Media Research (PhD)