🤖 AI Summary
Fine-grained illumination control in image generation faces challenges including poor cross-scene generalization and heavy reliance on large-scale annotated datasets. To address this, we propose a lightweight and efficient diffusion-based approach: (1) a LoRA-finetuned irradiance map regressor, and (2) explicit modeling of illumination as token-wise relationships within the self-attention mechanism. We empirically find that injecting illumination guidance during early denoising steps yields superior control. Further integrating classifier guidance enables precise generation under target illumination conditions. Our method achieves state-of-the-art illumination control and visual quality across diverse scenes using only ≤50 training images. It significantly surpasses existing approaches in both parameter efficiency (<0.5% trainable parameters) and data efficiency.
📝 Abstract
Light control in generated images is a difficult task, posing specific challenges, spanning over the entire image and frequency spectrum. Most approaches tackle this problem by training on extensive yet domain-specific datasets, limiting the inherent generalization and applicability of the foundational backbones used. Instead, PractiLight is a practical approach, effectively leveraging foundational understanding of recent generative models for the task. Our key insight is that lighting relationships in an image are similar in nature to token interaction in self-attention layers, and hence are best represented there. Based on this and other analyses regarding the importance of early diffusion iterations, PractiLight trains a lightweight LoRA regressor to produce the direct irradiance map for a given image, using a small set of training images. We then employ this regressor to incorporate the desired lighting into the generation process of another image using Classifier Guidance. This careful design generalizes well to diverse conditions and image domains. We demonstrate state-of-the-art performance in terms of quality and control with proven parameter and data efficiency compared to leading works over a wide variety of scenes types. We hope this work affirms that image lighting can feasibly be controlled by tapping into foundational knowledge, enabling practical and general relighting.