Multi-DNN Inference of Sparse Models on Edge SoCs

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the challenge of efficiently adapting heterogeneous accelerators for multi-DNN inference in edge computing systems, where existing approaches suffer from high SLO violation rates. The authors propose SparseLoom, a novel system that introduces, for the first time, a retraining-free model stitching mechanism. By dynamically reconfiguring sparse subgraphs, SparseLoom generates hardware-adaptive model variants on the fly, and integrates heterogeneous task scheduling with memory optimization to enable efficient collaborative inference on edge SoCs. Experimental results demonstrate that SparseLoom reduces SLO violations by up to 74%, achieves a 2.31× improvement in throughput, and decreases average memory overhead by 28% compared to state-of-the-art solutions.

Technology Category

Application Category

📝 Abstract

Modern edge applications increasingly require multi-DNN inference systems to execute tasks on heterogeneous processors, gaining performance from both concurrent execution and from matching each model to the most suited accelerator. However, existing systems support only a single model (or a few sparse variants) per task, which impedes the efficiency of this matching and results in high Service Level Objective violation rates. We introduce model stitching for multi-DNN inference systems, which creates model variants by recombining subgraphs from sparse models without re-training. We present a demonstrator system, SparseLoom, that shows model stitching can be deployed to SoCs. We show experimentally that SparseLoom reduces SLO violation rates by up to 74%, improves throughput by up to 2.31x, and lowers memory overhead by an average of 28% compared to state-of-the-art multi-DNN inference systems.

Problem

Research questions and friction points this paper is trying to address.

multi-DNN inference

edge SoCs

sparse models

SLO violation

heterogeneous processors

Innovation

Methods, ideas, or system contributions that make the work stand out.

model stitching

multi-DNN inference

sparse models