UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor identity consistency and severe identity confusion in multi-reference image customization, this paper proposes the Unified Multi-Identity Optimization (UMO) framework. We formulate multi-identity generation as a global assignment optimization problem for the first time, introducing a “many-to-many matching” paradigm, a novel matching reward mechanism, and a dedicated identity confusion evaluation metric. Leveraging diffusion models, we design a reinforcement learning training framework, supported by a curated multi-reference image dataset and a scalable training strategy. UMO significantly improves identity fidelity—achieving a +12.3% ID retention rate—and reduces identity confusion by −38.7% over prevailing customization methods. It attains state-of-the-art performance across multiple benchmarks. The code and models are publicly available.

Technology Category

Application Category

📝 Abstract
Recent advancements in image customization exhibit a wide range of application prospects due to stronger customization capabilities. However, since we humans are more sensitive to faces, a significant challenge remains in preserving consistent identity while avoiding identity confusion with multi-reference images, limiting the identity scalability of customization models. To address this, we present UMO, a Unified Multi-identity Optimization framework, designed to maintain high-fidelity identity preservation and alleviate identity confusion with scalability. With "multi-to-multi matching" paradigm, UMO reformulates multi-identity generation as a global assignment optimization problem and unleashes multi-identity consistency for existing image customization methods generally through reinforcement learning on diffusion models. To facilitate the training of UMO, we develop a scalable customization dataset with multi-reference images, consisting of both synthesised and real parts. Additionally, we propose a new metric to measure identity confusion. Extensive experiments demonstrate that UMO not only improves identity consistency significantly, but also reduces identity confusion on several image customization methods, setting a new state-of-the-art among open-source methods along the dimension of identity preserving. Code and model: https://github.com/bytedance/UMO
Problem

Research questions and friction points this paper is trying to address.

Maintaining consistent identity in multi-reference image customization
Avoiding identity confusion with scalable multi-identity preservation
Enhancing identity consistency for diffusion-based image customization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Multi-identity Optimization framework
Multi-to-multi matching paradigm
Reinforcement learning on diffusion models
🔎 Similar Papers
No similar papers found.
Y
Yufeng Cheng
UXO Team, Intelligent Creation Lab, ByteDance
W
Wenxu Wu
UXO Team, Intelligent Creation Lab, ByteDance
S
Shaojin Wu
UXO Team, Intelligent Creation Lab, ByteDance
Mengqi Huang
Mengqi Huang
University of Science and Technology of China
Image GenerationVideo GenerationUnified Multimodal GenerationGenerative AI
Fei Ding
Fei Ding
Unknown affiliation
Qian He
Qian He
ByteDance