Mosaic: Towards Efficient Training of Multimodal Models with Spatial Resource Multiplexing

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the inefficiency in multimodal model training caused by underutilization of GPU resources due to individual modules failing to fully occupy the hardware. To tackle this, the study introduces spatial resource sharing into this domain for the first time and proposes a spatiotemporal resource multiplexing training approach. By co-locating multiple modules on a single GPU and integrating a lightweight execution engine, fine-grained resource quota control, accurate performance modeling, and a heuristic deployment planning algorithm, the method enables efficient inter-module coordination and resource scheduling. Experimental results on real-world platforms demonstrate that the proposed approach achieves up to a 1.31× speedup in training throughput while substantially improving GPU utilization.

📝 Abstract

With the wide adoption of Multimodal Models (MMs) in real-world scenarios, it is significant to efficiently train emerging MMs that exhibit increasingly complex module architectures. For MM deployment, existing works allocate a GPU to only one MM module in a temporal-multiplexing manner; this compromises training efficiency because a single module often fails to achieve high GPU utilization. To improve GPU utilization and enable efficient MM training, we propose deploying MMs in a temporal-spatial multiplexing manner, allowing multiple MM modules to colocate on a GPU with well-controlled resource quotas. In this paper, we propose Apollo, an efficient MM training system that applies temporal-spatial multiplexing. We first develop a flexible and lightweight execution engine that supports MM training with arbitrary resource quotas, and then build a comprehensive and accurate performance model to estimate module execution time under different allocation plans. With the performance model, we further adopt effective heuristics to derive high-quality MM deployment plans efficiently. Testbed experiments confirm that Apollo effectively improves the training efficiency of popular MMs, with a training speedup of up to 1.31x.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Models

Training Efficiency

GPU Utilization

Resource Multiplexing

Model Deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal-spatial multiplexing

multimodal model training

GPU resource allocation