🤖 AI Summary
To address the high computational overhead and memory bottlenecks of Transformer-based large language models (LLMs) in networking applications—stemming from their quadratic time complexity and massive parameter count—this paper proposes Mamba4Net: the first framework to introduce the linear-complexity Mamba architecture into networked systems. It further designs a cross-architecture knowledge distillation mechanism that efficiently transfers network-specific knowledge—including viewport movement patterns, bitrate adaptation dynamics, and task dependency relationships—from a Transformer-based teacher model to a lightweight Mamba student model. The approach significantly improves deployment efficiency: on viewport prediction, adaptive bitrate streaming, and cluster task scheduling, it achieves a 3.96× throughput increase, reduces model storage footprint to just 5.48% of the original LLM baseline, and matches or surpasses the performance of non-LLM baselines.
📝 Abstract
Transformer-based large language models (LLMs) are increasingly being adopted in networking research to address domain-specific challenges. However, their quadratic time complexity and substantial model sizes often result in significant computational overhead and memory constraints, particularly in resource-constrained environments. Drawing inspiration from the efficiency and performance of the Deepseek-R1 model within the knowledge distillation paradigm, this paper introduces Mamba4Net, a novel cross-architecture distillation framework. Mamba4Net transfers networking-specific knowledge from transformer-based LLMs to student models built on the Mamba architecture, which features linear time complexity. This design substantially enhances computational efficiency compared to the quadratic complexity of transformer-based models, while the reduced model size further minimizes computational demands, improving overall performance and resource utilization. To evaluate its effectiveness, Mamba4Net was tested across three diverse networking tasks: viewport prediction, adaptive bitrate streaming, and cluster job scheduling. Compared to existing methods that do not leverage LLMs, Mamba4Net demonstrates superior task performance. Furthermore, relative to direct applications of transformer-based LLMs, it achieves significant efficiency gains, including a throughput 3.96 times higher and a storage footprint of only 5.48% of that required by previous LLM-based approaches. These results highlight Mamba4Net's potential to enable the cost-effective application of LLM-derived knowledge in networking contexts. The source code is openly available to support further research and development.