Quantized-Tinyllava: a new multimodal foundation model enables efficient split learning

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high communication overhead and privacy challenges in split learning of large models across distributed devices, this paper proposes an efficient split learning framework tailored for multimodal foundation models. The method introduces: (1) a learnable embedding quantization scheme grounded in entropy coding theory, mapping high-dimensional embeddings to low-bit integer representations; (2) joint end-to-end optimization of the discretized hierarchical structure and multimodal model architecture; and (3) substantial reduction in cross-device communication while preserving model accuracy. Experimental results demonstrate that the approach reduces communication overhead by over 60% in resource-constrained edge settings, strictly adhering to data-localization privacy constraints. This work provides a practical pathway for collaborative training of multimodal large models at the edge.

Technology Category

Application Category

📝 Abstract
Split learning is well known as a method for resolving data privacy concerns by training a model on distributed devices, thereby avoiding data sharing that raises privacy issues. However, high network communication costs are always an impediment to split learning, especially for large foundation models that require transmitting large amounts of high-dimensional data. To resolve this issue, we present a new multimodal model structure that incorporates a learning-based data compression method, which compresses model embeddings into low-bit integers while preserving the model's performance, greatly reducing the transmission costs between partitions. We then determine the optimal number of discrete representation levels based on a solid theoretical foundation from entropy coding.
Problem

Research questions and friction points this paper is trying to address.

Reducing network communication costs in split learning
Compressing model embeddings into low-bit integers efficiently
Determining optimal discrete representation levels using entropy coding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal model structure enables efficient split learning
Learning-based compression reduces embedding transmission costs
Optimal discrete levels determined using entropy coding theory
🔎 Similar Papers
No similar papers found.
J
Jiajun Guo
Department of Statistics, University of Michigan, Ann Arbor, MI 48109
Xin Luo
Xin Luo
University of Science and Technology of China
Computer Vision
J
Jie Liu
Gilbert S. Omenn Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109