🤖 AI Summary
This work addresses the challenge of preserving inter-layer structural consistency in AI-based layered image editing, where existing methods often suffer from background leakage into foreground layers and unstable alpha channels. The authors propose a training-free, context-conditioned framework for layered editing that employs a dual-stream attention mechanism to leverage contextual information from unedited layers, thereby guiding text-driven editing of the target RGBA layer while strictly preserving all other layers. This approach represents the first training-free method capable of context-aware layered editing, explicitly safeguarding layer integrity and alpha channel fidelity. To advance research in this direction, the authors also introduce LayerEditBench, a dedicated evaluation benchmark. Experiments demonstrate that the proposed method significantly outperforms strong baselines in both editing fidelity and alpha stability, effectively enhancing the realism and layer purity of composite images.
📝 Abstract
Layered image assets are widely used in real-world creative workflows, enabling non-destructive iteration and flexible re-composition. Recent advances in layered image generation and decomposition synthesize or recover layered representations, yet controllable editing of layered images remains challenging. Manual editing requires careful coordination across layers to maintain consistent illumination and contact, while AI-based pipelines collapse layers into a flattened image for editing, then decompose them again, introducing background-to-foreground leakage and unstable transparency. To address these limitations, we propose LimeCross, a training-free context-conditioned layered image editing framework that edits user-selected RGBA layers according to text while keeping the remaining layers unchanged. It leverages contextual cues from other layers using a bi-stream attention mechanism to preserve cross-layer consistency, while explicitly maintaining layer integrity to prevent the contamination of edited layers. To evaluate our approach, we introduce LayerEditBench, a benchmark of 1500 layered scenes with paired source/target prompts, along with evaluation protocols that assess both edit fidelity and alpha channel stability. Extensive experiments demonstrate that LimeCross improves layer purity and composite realism over strong editing baselines, establishing context-conditioned layered editing as a principled framework for controllable generative creation.