On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Frequent task switching in multi-task deployment on edge devices incurs substantial I/O overhead, degrading real-time responsiveness and resource efficiency. Method: This paper proposes DM-Sparse, an on-demand multi-task sparse framework that innovatively shifts sparse optimization from single-task model compression to cross-task switching efficiency. It employs block-granularity weight decomposition and cross-task sparse structure alignment to maximize parameter reuse, coupled with a dynamic differential module loading mechanism that loads only incremental parameters upon task switching. The design jointly optimizes for edge platform memory constraints and I/O characteristics. Contribution/Results: Evaluated on a real-world autonomous driving platform, DM-Sparse reduces average task-switching latency by 6.6× compared to conventional sparsity-based approaches, significantly improving real-time performance and hardware resource utilization.

Technology Category

Application Category

📝 Abstract
Sparsity is essential for deploying large models on resource constrained edge platforms. However, optimizing sparsity patterns for individual tasks in isolation ignores the significant I/O overhead incurred during frequent task switching. We introduce an on-demand multi-task sparsity framework specifically designed to minimize switching costs by maximizing parameter reuse. Unlike monolithic approaches, we decompose weights into reusable block-granular units and align sparse structures across tasks to maximize overlap. By dynamically loading only the small differential set of blocks required for the next task, our method effectively mitigates the cold-start latency inherent in traditional monolithic approaches.Experiments on a real-world autonomous driving platform demonstrate that our framework achieves superior switching efficiency, accelerating task switching by over 6.6X on average compared to existing sparsity methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizing sparsity patterns for individual tasks ignores I/O overhead during frequent task switching
Minimizing switching costs by maximizing parameter reuse across multiple tasks
Mitigating cold-start latency inherent in traditional monolithic sparsity approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

On-demand sparsity framework minimizes task switching costs
Decomposes weights into reusable block-granular units
Dynamically loads only differential blocks between tasks
🔎 Similar Papers
2024-09-02Conference on Empirical Methods in Natural Language ProcessingCitations: 1
L
Lianming Huang
City University of Hongkong
H
Haibo Hu
City University of Hongkong
Q
Qiao Li
Mohamed bin Zayed University of Artificial Intelligence
Nan Guan
Nan Guan
City University of Hong Kong
Cyber-Physical systemsEmbedded systemsReal-time systems
Chun Jason Xue
Chun Jason Xue
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Systems and Storage