🤖 AI Summary
Existing image generation methods struggle to simultaneously achieve controllability, consistency, and photorealism when editing specific elements, particularly due to inadequate modeling of physical effects such as shadows and reflections, as well as coherent composition relationships. To address this, this work proposes LASAGNA, a unified framework that jointly generates foreground objects with alpha transparency (RGBA) and photorealistic backgrounds, supporting multi-condition control via text prompts, foreground content, background context, and spatial masks. The contributions include the release of the LASAGNA-48K dataset and LASAGNABENCH—the first hierarchical editing benchmark—alongside innovations in high-fidelity RGBA compositing, physically plausible lighting and shadow modeling, and a novel training strategy. Experiments demonstrate that LASAGNA significantly outperforms existing approaches while preserving object identity and visual consistency, enabling diverse high-fidelity image editing applications.
📝 Abstract
Recent image generation models have shown impressive progress, yet they often struggle to yield controllable and consistent results when users attempt to edit specific elements within an existing image. Layered representations enable flexible, user-driven content creation, but existing approaches often fail to produce layers with coherent compositing relationships, and their object layers typically lack realistic visual effects such as shadows and reflections. To overcome these limitations, we propose LASAGNA, a novel, unified framework that generates an image jointly with its composing layers--a photorealistic background and a high-quality transparent foreground with compelling visual effects. Unlike prior work, LASAGNA efficiently learns correct image composition from a wide range of conditioning inputs--text prompts, foreground, background, and location masks--offering greater controllability for real-world applications. To enable this, we introduce LASAGNA-48K, a new dataset composed of clean backgrounds and RGBA foregrounds with physically grounded visual effects. We also propose LASAGNABENCH, the first benchmark for layer editing. We demonstrate that LASAGNA excels in generating highly consistent and coherent results across multiple image layers simultaneously, enabling diverse post-editing applications that accurately preserve identity and visual effects. LASAGNA-48K and LASAGNABENCH will be publicly released to foster open research in the community. The project page is https://rayjryang.github.io/LASAGNA-Page/.