DreamO: A Unified Framework for Image Customization

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image customization methods are predominantly single-task oriented, lacking the flexibility to jointly control and combine diverse conditions—such as identity, subject, style, and background. To address this, we propose UniCustom, the first unified image customization framework. It introduces two key innovations: feature-routing constraints for disentangled semantic control and position-aware placeholder tokens for precise spatial conditioning, both integrated within a Diffusion Transformer (DiT) architecture. A three-stage progressive training strategy further enables accurate multi-condition decoupling and high-fidelity generation. UniCustom is trained end-to-end on a large-scale, multi-task dataset with joint optimization, significantly improving generalization across unseen condition combinations and output consistency. Extensive experiments demonstrate that UniCustom achieves state-of-the-art fidelity and fine-grained controllability across diverse customization tasks, establishing a scalable, general-purpose paradigm for conditional image generation.

Technology Category

Application Category

📝 Abstract
Recently, extensive research on image customization (e.g., identity, subject, style, background, etc.) demonstrates strong customization capabilities in large-scale generative models. However, most approaches are designed for specific tasks, restricting their generalizability to combine different types of condition. Developing a unified framework for image customization remains an open challenge. In this paper, we present DreamO, an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions. Specifically, DreamO utilizes a diffusion transformer (DiT) framework to uniformly process input of different types. During training, we construct a large-scale training dataset that includes various customization tasks, and we introduce a feature routing constraint to facilitate the precise querying of relevant information from reference images. Additionally, we design a placeholder strategy that associates specific placeholders with conditions at particular positions, enabling control over the placement of conditions in the generated results. Moreover, we employ a progressive training strategy consisting of three stages: an initial stage focused on simple tasks with limited data to establish baseline consistency, a full-scale training stage to comprehensively enhance the customization capabilities, and a final quality alignment stage to correct quality biases introduced by low-quality data. Extensive experiments demonstrate that the proposed DreamO can effectively perform various image customization tasks with high quality and flexibly integrate different types of control conditions.
Problem

Research questions and friction points this paper is trying to address.

Developing a unified framework for diverse image customization tasks
Enabling seamless integration of multiple control conditions
Achieving high-quality results across various customization scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion transformer for uniform input processing
Introduces feature routing for precise image querying
Employs progressive three-stage training strategy
🔎 Similar Papers
No similar papers found.
Chong Mou
Chong Mou
Peking University
Diffusion ModelAI Generated ContentLow-level Computer Vision
Yanze Wu
Yanze Wu
ByteDance
computer vision
W
Wenxu Wu
Intelligent Creation Team, ByteDance
Z
Zinan Guo
Intelligent Creation Team, ByteDance
P
Pengze Zhang
Intelligent Creation Team, ByteDance
Y
Yufeng Cheng
Intelligent Creation Team, ByteDance
Yiming Luo
Yiming Luo
PhD student, The University of Hong Kong
Robotics
Fei Ding
Fei Ding
Unknown affiliation
S
Shiwen Zhang
Intelligent Creation Team, ByteDance
X
Xinghui Li
Intelligent Creation Team, ByteDance
M
Mengtian Li
Intelligent Creation Team, ByteDance
S
Songtao Zhao
Intelligent Creation Team, ByteDance
J
Jian Zhang
School of Electronic and Computer Engineering, Peking University
Qian He
Qian He
ByteDance
Xinglong Wu
Xinglong Wu
字节跳动算法工程师
人工智能