MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT

📅 2024-11-18

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing methods for IoT multi-modal sensing data—characterized by distribution, heterogeneity, sparsity, and lack of annotations—fail in practice due to their reliance on fully synchronized, densely labeled data. Method: We propose a shared descriptive modality–driven data binding mechanism to construct pseudo-paired training sets, enabling joint learning under weak synchronization and arbitrary modality missingness; design a weighted contrastive learning objective to mitigate cross-domain representation shift; and develop an adaptive fusion architecture that uniformly handles dynamic modality combinations. Contributions/Results: We introduce (i) the first shared-modality–driven data binding paradigm, (ii) a cross-domain robust weighted contrastive objective, and (iii) a modality-agnostic adaptive architecture. Evaluated on 10 real-world IoT multi-modal datasets, our approach significantly outperforms state-of-the-art methods, demonstrating strong robustness to high modality missingness (≥80%) and domain shift—establishing a new foundation for IoT multi-modal foundation models.

Technology Category

Application Category

📝 Abstract

Multimodal sensing systems are increasingly prevalent in various real-world applications. Most existing multimodal learning approaches heavily rely on training with a large amount of synchronized, complete multimodal data. However, such a setting is impractical in real-world IoT sensing applications where data is typically collected by distributed nodes with heterogeneous data modalities, and is also rarely labeled. In this paper, we propose MMBind, a new data binding approach for multimodal learning on distributed and heterogeneous IoT data. The key idea of MMBind is to construct a pseudo-paired multimodal dataset for model training by binding data from disparate sources and incomplete modalities through a sufficiently descriptive shared modality. We also propose a weighted contrastive learning approach to handle domain shifts among disparate data, coupled with an adaptive multimodal learning architecture capable of training models with heterogeneous modality combinations. Evaluations on ten real-world multimodal datasets highlight that MMBind outperforms state-of-the-art baselines under varying degrees of data incompleteness and domain shift, and holds promise for advancing multimodal foundation model training in IoT applicationsfootnote (The source code is available via https://github.com/nesl/multimodal-bind).

Problem

Research questions and friction points this paper is trying to address.

Handling distributed and heterogeneous IoT data for multimodal learning

Constructing pseudo-paired datasets from incomplete and unlabeled data

Addressing domain shifts and modality heterogeneity in IoT applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs pseudo-paired multimodal dataset

Uses weighted contrastive learning for domain shifts

Adaptive architecture for heterogeneous modality combinations

🔎 Similar Papers

No similar papers found.

Authors to Follow