🤖 AI Summary
This work addresses the generation and decomposition of hierarchical PSD files with transparent alpha channels. Methodologically: (i) we design a spatial-attention-driven multi-layer compositing mechanism that jointly models semantic structure, spatial relationships, and transparency; (ii) we propose an iterative context-erasure decomposition strategy to enable editable layer parsing from a single input image; and (iii) we introduce an RGBA-VAE encoder to ensure lossless alpha-channel reconstruction. Evaluated on a newly constructed RGBA hierarchical dataset, our approach significantly outperforms existing image layering methods in generation fidelity, inter-layer structural consistency, and alpha-channel accuracy. To the best of our knowledge, this is the first end-to-end framework capable of generating fully editable, transparency-aware PSD files directly from text or image inputs—leveraging a diffusion Transformer architecture based on the Flux design.
📝 Abstract
Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly challenging. We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem that enables both text-to-PSD generation and image-to-PSD decomposition through in-context learning. For text-to-PSD generation, OmniPSD arranges multiple target layers spatially into a single canvas and learns their compositional relationships through spatial attention, producing semantically coherent and hierarchically structured layers. For image-to-PSD decomposition, it performs iterative in-context editing, progressively extracting and erasing textual and foreground components to reconstruct editable PSD layers from a single flattened image. An RGBA-VAE is employed as an auxiliary representation module to preserve transparency without affecting structure learning. Extensive experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation, structural consistency, and transparency awareness, offering a new paradigm for layered design generation and decomposition with diffusion transformers.