LumiX: Structured and Coherent Text-to-Intrinsic Generation

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly generating physically consistent intrinsic images—albedo, irradiance, surface normals, depth, and final render—from a single text prompt. To ensure structural coherence across attributes and cross-image physical plausibility, we propose an end-to-end text-driven framework featuring Query-Broadcast Attention to model long-range dependencies among intrinsic components, and Tensor Low-Rank Adaptation (Tensor LoRA) for efficient parameter sharing and stable joint diffusion training. The method unifies text-to-intrinsic generation and naturally extends to image-conditioned intrinsic decomposition. Experiments demonstrate significant improvements in geometry–illumination–reflectance consistency: attribute alignment increases by 23%, and human preference scores reach 0.19—outperforming all existing approaches.

Technology Category

Application Category

📝 Abstract
We present LumiX, a structured diffusion framework for coherent text-to-intrinsic generation. Conditioned on text prompts, LumiX jointly generates a comprehensive set of intrinsic maps (e.g., albedo, irradiance, normal, depth, and final color), providing a structured and physically consistent description of an underlying scene. This is enabled by two key contributions: 1) Query-Broadcast Attention, a mechanism that ensures structural consistency by sharing queries across all maps in each self-attention block. 2) Tensor LoRA, a tensor-based adaptation that parameter-efficiently models cross-map relations for efficient joint training. Together, these designs enable stable joint diffusion training and unified generation of multiple intrinsic properties. Experiments show that LumiX produces coherent and physically meaningful results, achieving 23% higher alignment and a better preference score (0.19 vs. -0.41) compared to the state of the art, and it can also perform image-conditioned intrinsic decomposition within the same framework.
Problem

Research questions and friction points this paper is trying to address.

Generates multiple intrinsic maps from text prompts
Ensures structural consistency across generated intrinsic properties
Enables efficient joint training of diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured diffusion framework for text-to-intrinsic generation
Query-Broadcast Attention ensures structural consistency across maps
Tensor LoRA models cross-map relations for efficient joint training
🔎 Similar Papers
No similar papers found.