🤖 AI Summary
This work addresses the lack of a unified modeling framework for multimodal generation tasks. We propose a dual-decoding-path unified generative model featuring independent text and image decoders, a decoupled image tokenizer, and non-shared parameters—enabling native support for text-to-image synthesis, image editing, and in-context generation while preserving strong text generation capability. A novel image generation reflection mechanism is introduced to enhance iterative refinement, and we construct a high-quality, task-specific multimodal dataset along with a dedicated training pipeline. Experiments demonstrate state-of-the-art performance on both text-to-image and image editing benchmarks. Notably, our model achieves SOTA results on the OmniContext consistency evaluation among open-source models, significantly improving cross-task generalization and cross-modal synergy.
📝 Abstract
In this work, we introduce OmniGen2, a versatile and open-source generative model designed to provide a unified solution for diverse generation tasks, including text-to-image, image editing, and in-context generation. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. This design enables OmniGen2 to build upon existing multimodal understanding models without the need to re-adapt VAE inputs, thereby preserving the original text generation capabilities. To facilitate the training of OmniGen2, we developed comprehensive data construction pipelines, encompassing image editing and in-context generation data. Additionally, we introduce a reflection mechanism tailored for image generation tasks and curate a dedicated reflection dataset based on OmniGen2. Despite its relatively modest parameter size, OmniGen2 achieves competitive results on multiple task benchmarks, including text-to-image and image editing. To further evaluate in-context generation, also referred to as subject-driven tasks, we introduce a new benchmark named OmniContext. OmniGen2 achieves state-of-the-art performance among open-source models in terms of consistency. We will release our models, training code, datasets, and data construction pipeline to support future research in this field. Project Page: https://vectorspacelab.github.io/OmniGen2; GitHub Link: https://github.com/VectorSpaceLab/OmniGen2