Controllable Layered Image Generation for Real-World Editing

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing image generation methods struggle to simultaneously achieve controllability, consistency, and photorealism when editing specific elements, particularly due to inadequate modeling of physical effects such as shadows and reflections, as well as coherent composition relationships. To address this, this work proposes LASAGNA, a unified framework that jointly generates foreground objects with alpha transparency (RGBA) and photorealistic backgrounds, supporting multi-condition control via text prompts, foreground content, background context, and spatial masks. The contributions include the release of the LASAGNA-48K dataset and LASAGNABENCH—the first hierarchical editing benchmark—alongside innovations in high-fidelity RGBA compositing, physically plausible lighting and shadow modeling, and a novel training strategy. Experiments demonstrate that LASAGNA significantly outperforms existing approaches while preserving object identity and visual consistency, enabling diverse high-fidelity image editing applications.

Technology Category

Application Category

📝 Abstract

Recent image generation models have shown impressive progress, yet they often struggle to yield controllable and consistent results when users attempt to edit specific elements within an existing image. Layered representations enable flexible, user-driven content creation, but existing approaches often fail to produce layers with coherent compositing relationships, and their object layers typically lack realistic visual effects such as shadows and reflections. To overcome these limitations, we propose LASAGNA, a novel, unified framework that generates an image jointly with its composing layers--a photorealistic background and a high-quality transparent foreground with compelling visual effects. Unlike prior work, LASAGNA efficiently learns correct image composition from a wide range of conditioning inputs--text prompts, foreground, background, and location masks--offering greater controllability for real-world applications. To enable this, we introduce LASAGNA-48K, a new dataset composed of clean backgrounds and RGBA foregrounds with physically grounded visual effects. We also propose LASAGNABENCH, the first benchmark for layer editing. We demonstrate that LASAGNA excels in generating highly consistent and coherent results across multiple image layers simultaneously, enabling diverse post-editing applications that accurately preserve identity and visual effects. LASAGNA-48K and LASAGNABENCH will be publicly released to foster open research in the community. The project page is https://rayjryang.github.io/LASAGNA-Page/.

Problem

Research questions and friction points this paper is trying to address.

controllable image editing

layered image generation

realistic visual effects

coherent compositing

image layer decomposition

Innovation

Methods, ideas, or system contributions that make the work stand out.

layered image generation

controllable image editing

physically grounded visual effects