PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-layer image generation methods struggle to simultaneously ensure global layout coherence, physically plausible inter-layer interactions (e.g., shadows, reflections), and high-fidelity transparency. To address this, we propose PSDiffusion—the first end-to-end unified diffusion framework capable of jointly synthesizing an RGB background and multiple RGBA foreground layers in a single forward pass. Its core innovations are: (1) a global-interlayer diffusion mechanism that replaces conventional sequential layer generation and post-hoc decomposition; (2) joint latent-space modeling across layers with cross-layer attention for spatial-visual alignment; and (3) a dual-path conditional control architecture decoupling layout and appearance guidance. Experiments demonstrate that PSDiffusion significantly improves alpha-matting accuracy and physical plausibility, achieving state-of-the-art performance on multi-layer compositing tasks—including complex occlusion, soft shadows, and transparent reflections—while preserving photorealistic fidelity.

Technology Category

Application Category

📝 Abstract
Diffusion models have made remarkable advancements in generating high-quality images from textual descriptions. Recent works like LayerDiffuse have extended the previous single-layer, unified image generation paradigm to transparent image layer generation. However, existing multi-layer generation methods fail to handle the interactions among multiple layers such as rational global layout, physics-plausible contacts and visual effects like shadows and reflections while maintaining high alpha quality. To solve this problem, we propose PSDiffusion, a unified diffusion framework for simultaneous multi-layer text-to-image generation. Our model can automatically generate multi-layer images with one RGB background and multiple RGBA foregrounds through a single feed-forward process. Unlike existing methods that combine multiple tools for post-decomposition or generate layers sequentially and separately, our method introduces a global-layer interactive mechanism that generates layered-images concurrently and collaboratively, ensuring not only high quality and completeness for each layer, but also spatial and visual interactions among layers for global coherence.
Problem

Research questions and friction points this paper is trying to address.

Handling interactions among multiple image layers
Ensuring rational global layout and physics-plausible contacts
Maintaining high alpha quality with visual effects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified diffusion framework for multi-layer generation
Global-layer interactive mechanism for coherence
Single feed-forward process for layered images
🔎 Similar Papers
No similar papers found.