UniLight: A Unified Representation for Lighting

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing lighting representations—such as environment maps, spherical harmonics (SH), irradiance, and text—are modality-isolated, hindering cross-modal alignment and joint utilization. To address this, we propose UniLight, the first unified implicit lighting representation framework. UniLight employs modality-specific encoders, contrastive learning for cross-modal alignment, and an auxiliary SH coefficient prediction task to jointly embed text, images, irradiance, and environment maps. Crucially, it explicitly models directional priors, significantly enhancing cross-modal consistency and transferability. Extensive experiments demonstrate that UniLight achieves state-of-the-art performance across diverse lighting-aware tasks, including cross-modal lighting retrieval, environment map generation, and diffusion-model-based lighting control. By unifying heterogeneous lighting modalities into a coherent latent space, UniLight establishes a flexible and precise foundation for multimodal lighting manipulation.

Technology Category

Application Category

📝 Abstract
Lighting has a strong influence on visual appearance, yet understanding and representing lighting in images remains notoriously difficult. Various lighting representations exist, such as environment maps, irradiance, spherical harmonics, or text, but they are incompatible, which limits cross-modal transfer. We thus propose UniLight, a joint latent space as lighting representation, that unifies multiple modalities within a shared embedding. Modality-specific encoders for text, images, irradiance, and environment maps are trained contrastively to align their representations, with an auxiliary spherical-harmonics prediction task reinforcing directional understanding. Our multi-modal data pipeline enables large-scale training and evaluation across three tasks: lighting-based retrieval, environment-map generation, and lighting control in diffusion-based image synthesis. Experiments show that our representation captures consistent and transferable lighting features, enabling flexible manipulation across modalities.
Problem

Research questions and friction points this paper is trying to address.

Unifies multiple incompatible lighting representations into a shared embedding
Enables cross-modal transfer for lighting-based retrieval and generation
Captures consistent lighting features for flexible manipulation across modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint latent space unifies multiple lighting modalities
Contrastive training aligns text, image, irradiance encoders
Enables cross-modal retrieval, generation, and synthesis control
🔎 Similar Papers
No similar papers found.