OmniPSD: Layered PSD Generation with Diffusion Transformer

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the generation and decomposition of hierarchical PSD files with transparent alpha channels. Methodologically: (i) we design a spatial-attention-driven multi-layer compositing mechanism that jointly models semantic structure, spatial relationships, and transparency; (ii) we propose an iterative context-erasure decomposition strategy to enable editable layer parsing from a single input image; and (iii) we introduce an RGBA-VAE encoder to ensure lossless alpha-channel reconstruction. Evaluated on a newly constructed RGBA hierarchical dataset, our approach significantly outperforms existing image layering methods in generation fidelity, inter-layer structural consistency, and alpha-channel accuracy. To the best of our knowledge, this is the first end-to-end framework capable of generating fully editable, transparency-aware PSD files directly from text or image inputs—leveraging a diffusion Transformer architecture based on the Flux design.

Technology Category

Application Category

📝 Abstract

Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly challenging. We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem that enables both text-to-PSD generation and image-to-PSD decomposition through in-context learning. For text-to-PSD generation, OmniPSD arranges multiple target layers spatially into a single canvas and learns their compositional relationships through spatial attention, producing semantically coherent and hierarchically structured layers. For image-to-PSD decomposition, it performs iterative in-context editing, progressively extracting and erasing textual and foreground components to reconstruct editable PSD layers from a single flattened image. An RGBA-VAE is employed as an auxiliary representation module to preserve transparency without affecting structure learning. Extensive experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation, structural consistency, and transparency awareness, offering a new paradigm for layered design generation and decomposition with diffusion transformers.

Problem

Research questions and friction points this paper is trying to address.

Generates layered PSD files from text descriptions

Decomposes flattened images into editable PSD layers

Preserves transparency and structure in layered design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified diffusion framework for text-to-PSD and image-to-PSD tasks

Spatial attention learns compositional relationships among multiple layers

RGBA-VAE preserves transparency without affecting structure learning

🔎 Similar Papers

Diffusion Models: A Comprehensive Survey of Methods and Applications