CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

📅 2025-05-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
On integrated NPUs, multi-tenant DNN workloads sharing on-chip caches suffer from unpredictable cache conflicts and low utilization. Method: This paper proposes a hardware–software co-design approach: (1) a novel cache partitioning mechanism enabling both model-exclusive and NPU-controllable shared caching; and (2) a joint scheduling algorithm combining capacity-aware static mapping with runtime dynamic quota adjustment. Contributions/Results: The hardware implementation is lightweight and scalable; the software scheduler ensures fairness and efficiency. Experiments show an average 33.4% reduction in memory accesses, up to 2.56× single-model speedup, and 1.88× average speedup across workloads—significantly improving cache efficiency and system throughput in multi-tenant scenarios.

Technology Category

Application Category

📝 Abstract
With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant DNNs on integrated NPUs. Specifically, a lightweight architecture is proposed to support model-exclusive, NPU-controlled regions inside shared cache to eliminate unexpected cache contention. Moreover, a cache scheduling method is proposed to improve shared cache utilization. In particular, it includes a cache-aware mapping method for adaptability to the varying available cache capacity and a dynamic allocation algorithm to adjust the usage among co-located DNNs at runtime. Compared to prior works, CaMDN reduces the memory access by 33.4% on average and achieves a model speedup of up to 2.56$ imes$ (1.88$ imes$ on average).
Problem

Research questions and friction points this paper is trying to address.

Improving cache efficiency for multi-tenant DNNs on NPUs
Eliminating cache contention in shared cache environments
Enhancing shared cache utilization through dynamic allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight architecture for exclusive cache regions
Cache-aware mapping for varying cache capacity
Dynamic allocation algorithm for runtime adjustment
🔎 Similar Papers
No similar papers found.
T
Tianhao Cai
State Key Laboratory of CCSE, Beihang University, Beijing, China; School of Computer Science and Engineering, Beihang University, Beijing, China
L
Liang Wang
State Key Laboratory of CCSE, Beihang University, Beijing, China; School of Computer Science and Engineering, Beihang University, Beijing, China
Limin Xiao
Limin Xiao
FDU
Fiber OpticsOptoelectronics
Meng Han
Meng Han
Intelligence Fusion Research Center (IFRC)
Reliable AIData MiningMachine LearningBig DataSecurity&Privacy
Z
Zeyu Wang
State Key Laboratory of CCSE, Beihang University, Beijing, China; School of Computer Science and Engineering, Beihang University, Beijing, China
Lin Sun
Lin Sun
Qihoo 360
large language model
Xiaojian Liao
Xiaojian Liao
Beihang University
Storage SystemAI System