MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal fusion evaluation is hindered by small-scale, narrow-domain, task-specific, and inconsistently standardized benchmarks, leading to poor model generalizability and incomparable results. To address this, we propose MMBench—the first large-scale, domain-adaptive multimodal fusion benchmark—integrating over 30 datasets, 15 modalities, and 20 predictive tasks across critical domains including healthcare, remote sensing, and industrial inspection. We design a unified cross-domain evaluation framework and an open-source automated pipeline supporting early-, late-, and hybrid-fusion paradigms. Our framework incorporates standardized preprocessing, cross-modal alignment, and domain-adaptation mechanisms. Extensive experiments establish multiple new state-of-the-art baselines, significantly improving model generalizability and reproducibility. MMBench provides a rigorous, open, and extensible evaluation infrastructure for advancing multimodal fusion research.

Technology Category

Application Category

📝 Abstract
Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.
Problem

Research questions and friction points this paper is trying to address.

Lack of adequate evaluation benchmarks hinders multimodal fusion progress
Current methods evaluated on limited datasets create biased assessments
Absence of unified standards prevents fair comparison between fusion approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale domain-adaptive multimodal evaluation benchmark
Unified automated pipeline with standardized model implementations
Integrated datasets across multiple modalities and predictive tasks
🔎 Similar Papers
No similar papers found.
L
Leyan Xue
College of Intelligence and Computing, Tianjin University, Tianjin, China
Changqing Zhang
Changqing Zhang
Professor, Tianjin University
Machine LearningMultimodal LearningLLM
K
Kecheng Xue
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
X
Xiaohong Liu
Institute of Medical Artificial Intelligence, South China Hospital, Medical School, Shenzhen University, Guangdong, China
Guangyu Wang
Guangyu Wang
Houston Methodist
BioinformaticsComputational biologyAIepigenetics
Zongbo Han
Zongbo Han
Assistant Professor, BUPT; TJU
Machine Learning