OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative models predominantly focus on RGB synthesis and lack unified support for RGBA image generation and editing; single-task alpha models suffer from poor generalization, while mainstream multi-task frameworks omit the transparency (alpha) channel. Method: We propose the first sequence-to-sequence capable unified RGBA multi-task framework, covering 21 diverse generation and editing tasks. Our approach introduces bidirectionally extended MSRoPE-BiL positional encoding, constructs a high-fidelity multi-layer AlphaLayers dataset, and employs a Diffusion Transformer to enable joint multi-input–multi-target modeling. Results: On AIM-500, our method achieves an 84.8% relative reduction in Structural Alpha Distance (SAD); for layer-conditioned alpha completion, it attains over 90% human preference—consistently outperforming task-specific baselines across all evaluated dimensions.

Technology Category

Application Category

📝 Abstract
Generative models have excelled in RGB synthesis, but real-world applications require RGBA manipulation. This has led to a fragmented landscape: specialized, single-task models handle alpha but lack versatility, while unified multi-task frameworks are confined to the RGB domain. To bridge this critical gap, we propose OmniAlpha, the first unified, multi-task generative framework for sequence-to-sequence RGBA image generation and editing. Its architecture features MSRoPE-BiL, a novel RoPE method with a bi-directionally extendable layer axis for its Diffusion Transformer (DiT) backbone, enabling the concurrent processing of multiple input and target RGBA layers. To power this framework, we introduce AlphaLayers, a new dataset of 1,000 high-quality, multi-layer triplets, built via a novel automated synthesis and filter pipeline. Jointly training OmniAlpha on this dataset across a comprehensive suite of 21 diverse tasks, extensive experiments demonstrate that our unified approach consistently outperforms strong, specialized baselines. Most notably, OmniAlpha achieves a dramatic 84.8% relative reduction in SAD for mask-free matting on AIM-500 and wins over 90% of human preferences in layer-conditioned completion. Our work proves that a unified, multi-task model can learn a superior shared representation for RGBA, paving the way for more powerful, layer-aware generative systems.
Problem

Research questions and friction points this paper is trying to address.

Bridging the gap between specialized RGBA models and unified frameworks
Enabling concurrent multi-task RGBA image generation and editing
Creating a shared representation for diverse layer-aware generative tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified multi-task RGBA sequence-to-sequence generation framework
MSRoPE-BiL RoPE method with bidirectional extendable layer axis
AlphaLayers dataset with automated synthesis pipeline for training
🔎 Similar Papers
No similar papers found.
H
Hao Yu
Tsinghua University
J
Jiabo Zhan
Tsinghua University
Z
Zile Wang
Tsinghua University
J
Jinglin Wang
Beijing University of Posts and Telecommunications
Huaisong Zhang
Huaisong Zhang
Tsinghua University
H
Hongyu Li
Beihang University
Xinrui Chen
Xinrui Chen
Tsinghua University
Efficient Deep LearningComputer Vision
Yongxian Wei
Yongxian Wei
Tsinghua University
Machine Learning
C
Chun Yuan
Tsinghua University