PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Multi-layer transparent image generation is hindered by the absence of high-quality, large-scale datasets. To address this, we introduce PrismLayersPro—the first open-source, high-fidelity multi-layer transparent image dataset comprising 200K samples. We propose a training-free diffusion-based synthesis pipeline and design two complementary architectures: LayerFLUX for single-layer generation and MultiLayerFLUX for joint multi-layer synthesis—enabling the first text-driven, semantic-layout-guided, and editable layered image generation. Key innovations include precise alpha matte modeling, semantic layout conditioning, and a rigorous human curation protocol. Quantitative and user studies demonstrate that our ART+ model achieves superior user preference (60% win rate) over the original ART and matches FLUX.1-[dev] in visual fidelity. All data, models, and tooling are fully open-sourced.

Technology Category

Application Category

📝 Abstract

Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. In this paper, we address this fundamental challenge by: (i) releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro) dataset of 200K (20K) multilayer transparent images with accurate alpha mattes, (ii) introducing a trainingfree synthesis pipeline that generates such data on demand using off-the-shelf diffusion models, and (iii) delivering a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models. The key technical contributions include: LayerFLUX, which excels at generating high-quality single transparent layers with accurate alpha mattes, and MultiLayerFLUX, which composes multiple LayerFLUX outputs into complete images, guided by human-annotated semantic layout. To ensure higher quality, we apply a rigorous filtering stage to remove artifacts and semantic mismatches, followed by human selection. Fine-tuning the state-of-the-art ART model on our synthetic PrismLayersPro yields ART+, which outperforms the original ART in 60% of head-to-head user study comparisons and even matches the visual quality of images generated by the FLUX.1-[dev] model. We anticipate that our work will establish a solid dataset foundation for the multi-layer transparent image generation task, enabling research and applications that require precise, editable, and visually compelling layered imagery.

Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality multi-layer transparent image datasets

Need for training-free synthesis of multi-layer transparent images

Developing models matching modern text-to-image generation aesthetics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Released PrismLayersPro dataset with 200K transparent images

Training-free synthesis pipeline using diffusion models

ART+ model for multi-layer generation with LayerFLUX

🔎 Similar Papers

ReviveDiff: A Universal Diffusion Model for Restoring Images in Adverse Weather Conditions

2024-09-27arXiv.orgCitations: 0

Nvidia

The base salary range is 224,000 USD - 356,500 USD for Level 5, and 272,000 USD - 431,250 USD for Level 6. You will also be eligible for equity and benefits.

US, CA, Santa Clara

AI Research Engineer, Media - Meta Superintelligence Labs