FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion

๐Ÿ“… 2024-02-05
๐Ÿ›๏ธ Neural Information Processing Systems
๐Ÿ“ˆ Citations: 16
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses three pervasive challenges in multimodal time-series data: modality missingness, irregular sampling, and scarcity of labeled examples. To this end, we propose a scalable Mixture-of-Experts (MoE) architecture. Our method integrates a Transformer-based backbone with modality-agnostic feature alignment and a missingness-aware routing strategy to enable robust cross-modal fusion. Crucially, we introduce the first dynamic sparse gating mechanism explicitly designed for incomplete and asynchronous multimodal inputsโ€”guaranteed to converge theoretically and supporting arbitrary subsets of available modalities. Experimental results across multiple real-world time-series forecasting benchmarks demonstrate that our approach significantly outperforms existing state-of-the-art methods. Notably, it maintains strong generalization and stability even under extreme conditions: modality missing rates exceeding 60% and highly irregular sampling patterns.

Technology Category

Application Category

๐Ÿ“ Abstract
As machine learning models in critical fields increasingly grapple with multimodal data, they face the dual challenges of handling a wide array of modalities, often incomplete due to missing elements, and the temporal irregularity and sparsity of collected samples. Successfully leveraging this complex data, while overcoming the scarcity of high-quality training samples, is key to improving these models' predictive performance. We introduce ``FuseMoE'', a mixture-of-experts framework incorporated with an innovative gating function. Designed to integrate a diverse number of modalities, FuseMoE is effective in managing scenarios with missing modalities and irregularly sampled data trajectories. Theoretically, our unique gating function contributes to enhanced convergence rates, leading to better performance in multiple downstream tasks. The practical utility of FuseMoE in the real world is validated by a diverse set of challenging prediction tasks.
Problem

Research questions and friction points this paper is trying to address.

Handling incomplete multimodal data with missing elements
Managing temporal irregularity and sparsity in samples
Overcoming scarcity of high-quality multimodal training samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-experts framework with innovative gating function
Handles missing modalities and irregularly sampled data
Enhances convergence rates for better downstream performance
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xing Han
Department of Computer Science, Johns Hopkins University, Baltimore, MD
H
Huy Nguyen
Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX
C
C. Harris
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD
Nhat Ho
Nhat Ho
Assistant Professor at University of Texas, Austin
Machine LearningBayesian StatisticsOptimizationOptimal TransportDeep Learning
S
S. Saria
Bayesian Health, New York, NY