π€ AI Summary
Text-to-image diffusion models lack intuitive, fine-grained control over shadow shape, position, and intensity in portrait generation; existing editing approaches either rely on costly real-world light-field data or suffer from high computational overhead and poor generalization. Method: We propose Shadow Directorβa novel framework that, for the first time, decouples and parameterizes shadow attributes directly within the latent space of pre-trained diffusion models. It employs a lightweight shadow estimation network, feature redirection, and a parameterized control mechanism, trained exclusively on a small synthetic dataset (thousands of images) in just a few hours. Contribution/Results: Shadow Director enables real-time, identity-preserving, cross-style shadow editing without requiring retraining. It achieves superior generalization across diverse portrait styles, reduces training cost by two orders of magnitude, and significantly enhances artistic fidelity and controllability.
π Abstract
Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, we introduce Shadow Director, a method that extracts and manipulates hidden shadow attributes within well-trained diffusion models. Our approach uses a small estimation network that requires only a few thousand synthetic images and hours of training-no costly real-world light-stage data needed. Shadow Director enables parametric and intuitive control over shadow shape, placement, and intensity during portrait generation while preserving artistic integrity and identity across diverse styles. Despite training only on synthetic data built on real-world identities, it generalizes effectively to generated portraits with diverse styles, making it a more accessible and resource-friendly solution.