π€ AI Summary
Current raster image synthesis is irreversible, hindering layer-level editing; existing matting and inpainting methods suffer from limited segmentation accuracy and controllability. To address these challenges, we propose LayerDecompose-DiTβa novel framework integrating Multi-Layer Conditional Adapters with a diffusion-Transformer-based multi-layer token control mechanism, enabling fine-grained, editable RGBA layer decomposition and reconstruction. We introduce the first design-oriented multi-layer decomposition benchmark dataset and corresponding evaluation metrics. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches in decomposition accuracy, semantic consistency, and editing controllability. Critically, the generated layers are directly importable into mainstream design tools (e.g., PowerPoint), bridging theoretical innovation with practical applicability.
π Abstract
This work presents Controllable Layer Decomposition (CLD), a method for achieving fine-grained and controllable multi-layer separation of raster images. In practical workflows, designers typically generate and edit each RGBA layer independently before compositing them into a final raster image. However, this process is irreversible: once composited, layer-level editing is no longer possible. Existing methods commonly rely on image matting and inpainting, but remain limited in controllability and segmentation precision. To address these challenges, we propose two key modules: LayerDecompose-DiT (LD-DiT), which decouples image elements into distinct layers and enables fine-grained control; and Multi-Layer Conditional Adapter (MLCA), which injects target image information into multi-layer tokens to achieve precise conditional generation. To enable a comprehensive evaluation, we build a new benchmark and introduce tailored evaluation metrics. Experimental results show that CLD consistently outperforms existing methods in both decomposition quality and controllability. Furthermore, the separated layers produced by CLD can be directly manipulated in commonly used design tools such as PowerPoint, highlighting its practical value and applicability in real-world creative workflows.