MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing clinical multimodal AI systems lack systematic evaluation of the robustness of fusion architectures under sensor failure scenarios, including both complete modality absence and continuous intra-modality missingness. To address this gap, this work proposes MuteBench, a comprehensive benchmark spanning seven clinical domains, nine datasets, six fusion architectures, and two missingness patterns, enabling unified assessment of model fault tolerance under controlled levels of missing data severity. The study reveals that architecture type is the strongest predictor of robustness; channel-independent models exhibit resilience to full modality dropout but heightened sensitivity to intra-modality gaps; curriculum-based modality dropout proves effective only within the maximum dropout rate observed during training; and diffusion-based imputation substantially enhances classification performance for models otherwise vulnerable to input corruption.

📝 Abstract

Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and within-modality missing, where a contiguous time segment is lost. No existing benchmark evaluates multiple fusion architectures under both failure modes at controlled severity levels across diverse clinical datasets. We present MuteBench, a benchmark covering 9 datasets from 7 clinical domains, 6 fusion architectures, and 2 missing-data modes over 125,000 samples. Through this benchmark, we find that architecture family is the strongest predictor of robustness, outweighing parameter count. Channel-independent models tolerate modality missing well but can be sensitive to within-modality missing, especially on short sequences. Curriculum modality dropout protects reliably only up to the maximum dropout rate used in training. We also find that channel count, sequence length, and modality alignment jointly determine which failure mode poses the greater threat. Finally, a PTB-XL case study suggests that diffusion-based imputation can improve downstream classification under within-modality missing, with the largest gains for models whose expert routing is most sensitive to corrupted inputs, though broader validation across datasets remains an open direction. MuteBench provides practitioners with concrete guidance for both selecting existing architectures and informing the design of future robust multimodal fusion methods.

Problem

Research questions and friction points this paper is trying to address.

multimodal fusion

modality missing

within-modality missing

robustness evaluation

clinical AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal fusion

missing modality

robustness benchmark