UmniBench: Unified Understand and Generation Model Oriented Omni-dimensional Benchmark

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluation methods for Unified Multimodal Models (UMMs) assess understanding, generation, and editing capabilities in isolation, lacking a holistic, synergistic benchmark. Method: We introduce OMNI, the first comprehensive benchmark tailored for UMMs, covering 13 domains and 200+ concepts. It pioneers a self-consistent “understanding-driven generation/editing” evaluation paradigm, enabling integrated assessment and disentangled analysis of all three capabilities. OMNI employs human-verified Prompt-QA pairs and combines end-to-end with modular evaluation to achieve closed-loop, multi-domain, multi-granularity assessment. Contribution/Results: We systematically evaluate 24 mainstream models—including UMMs and unimodal large language/vision models—revealing for the first time widespread capability imbalance. OMNI provides reproducible diagnostic tools and principled optimization guidance, establishing a new standard for holistic UMM evaluation.

Technology Category

Application Category

📝 Abstract
Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. However, evaluations of unified multimodal models (UMMs) remain decoupled, assessing their understanding and generation abilities separately with corresponding datasets. To address this, we propose UmniBench, a benchmark tailored for UMMs with omni-dimensional evaluation. First, UmniBench can assess the understanding, generation, and editing ability within a single evaluation process. Based on human-examined prompts and QA pairs, UmniBench leverages UMM itself to evaluate its generation and editing ability with its understanding ability. This simple but effective paradigm allows comprehensive evaluation of UMMs. Second, UmniBench covers 13 major domains and more than 200 concepts, ensuring a thorough inspection of UMMs. Moreover, UmniBench can also decouple and separately evaluate understanding, generation, and editing abilities, providing a fine-grained assessment. Based on UmniBench, we benchmark 24 popular models, including both UMMs and single-ability large models. We hope this benchmark provides a more comprehensive and objective view of unified models and logistical support for improving the performance of the community model.
Problem

Research questions and friction points this paper is trying to address.

Evaluates unified multimodal models' understanding, generation, and editing abilities together
Covers 13 domains and over 200 concepts for thorough assessment
Benchmarks 24 models to provide comprehensive and objective comparisons
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified multimodal benchmark for comprehensive evaluation
Self-evaluation using model's understanding to assess generation
Covers 13 domains and 200 concepts for thorough inspection
🔎 Similar Papers
No similar papers found.
K
Kai Liu
Shanghai Jiao Tong University
L
Leyang Chen
Shanghai Jiao Tong University
Wenbo Li
Wenbo Li
The Chinese University of Hong Kong
Computer VisionDeep Learning
Z
Zhikai Chen
Huawei Technologies Ltd.
Zhixin Wang
Zhixin Wang
ZheJiang University
RL systems
R
Renjing Pei
Huawei Technologies Ltd.
Linghe Kong
Linghe Kong
Shanghai Jiao Tong University
Internet of ThingsMobile computingBig data
Y
Yulun Zhang
Shanghai Jiao Tong University