LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-image (T2I) models struggle to precisely control scene illumination, limiting controllable generation of emotion, atmosphere, and aesthetics. To address this, we propose the first physics-guided framework for single-image illumination customization. Our method comprises three key components: (i) physically grounded illumination enhancement based on Planckian locus modeling to improve illumination fidelity; (ii) edge-guided illumination-structure disentangled prompt learning, which freezes ControlNet parameters to enable structure-aware illumination modeling; and (iii) foreground-focused + background-adaptive masked reconstruction loss for context-aware illumination transfer. Requiring only a single object image, our approach learns dedicated illumination prompts end-to-end. Extensive evaluations demonstrate significant improvements over state-of-the-art customization methods in illumination accuracy, visual appeal, and scene consistency. User preference studies further validate its effectiveness and perceptual superiority.

Technology Category

Application Category

📝 Abstract
Current text-to-image (T2I) models have demonstrated remarkable progress in creative image generation, yet they still lack precise control over scene illuminants, which is a crucial factor for content designers aiming to manipulate the mood, atmosphere, and visual aesthetics of generated images. In this paper, we present an illuminant personalization method named LumiCtrl that learns an illuminant prompt given a single image of an object. LumiCtrl consists of three basic components: given an image of the object, our method applies (a) physics-based illuminant augmentation along the Planckian locus to create fine-tuning variants under standard illuminants; (b) edge-guided prompt disentanglement using a frozen ControlNet to ensure prompts focus on illumination rather than structure; and (c) a masked reconstruction loss that focuses learning on the foreground object while allowing the background to adapt contextually, enabling what we call contextual light adaptation. We qualitatively and quantitatively compare LumiCtrl against other T2I customization methods. The results show that our method achieves significantly better illuminant fidelity, aesthetic quality, and scene coherence compared to existing personalization baselines. A human preference study further confirms strong user preference for LumiCtrl outputs. The code and data will be released upon publication.
Problem

Research questions and friction points this paper is trying to address.

Enables precise lighting control in text-to-image generation
Learns illuminant prompts from a single object image
Improves illuminant fidelity, aesthetics, and scene coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-based illuminant augmentation for fine-tuning
Edge-guided prompt disentanglement using frozen ControlNet
Masked reconstruction loss for contextual light adaptation
🔎 Similar Papers
No similar papers found.
Muhammad Atif Butt
Muhammad Atif Butt
Ph.D. Candidate, Computer Vision Center, Universitat Autònoma de Barcelona
Computer VisionGenerative AIAutonomous DrivingAdversarial ML
K
Kai Wang
Computer Vision Center, Spain; Program of Computer Science, City University of Hong Kong (Dongguan), China; City University of Hong Kong, HK SAR, China
J
Javier Vazquez-Corral
Computer Vision Center, Spain; Universitat Autònoma de Barcelona, Spain
J
Joost Van De Weijer
Computer Vision Center, Spain; Universitat Autònoma de Barcelona, Spain