CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

📅 2025-05-10

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

On integrated NPUs, multi-tenant DNN workloads sharing on-chip caches suffer from unpredictable cache conflicts and low utilization. Method: This paper proposes a hardware–software co-design approach: (1) a novel cache partitioning mechanism enabling both model-exclusive and NPU-controllable shared caching; and (2) a joint scheduling algorithm combining capacity-aware static mapping with runtime dynamic quota adjustment. Contributions/Results: The hardware implementation is lightweight and scalable; the software scheduler ensures fairness and efficiency. Experiments show an average 33.4% reduction in memory accesses, up to 2.56× single-model speedup, and 1.88× average speedup across workloads—significantly improving cache efficiency and system throughput in multi-tenant scenarios.

Technology Category

Application Category

📝 Abstract

With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant DNNs on integrated NPUs. Specifically, a lightweight architecture is proposed to support model-exclusive, NPU-controlled regions inside shared cache to eliminate unexpected cache contention. Moreover, a cache scheduling method is proposed to improve shared cache utilization. In particular, it includes a cache-aware mapping method for adaptability to the varying available cache capacity and a dynamic allocation algorithm to adjust the usage among co-located DNNs at runtime. Compared to prior works, CaMDN reduces the memory access by 33.4% on average and achieves a model speedup of up to 2.56$ imes$ (1.88$ imes$ on average).

Problem

Research questions and friction points this paper is trying to address.

Improving cache efficiency for multi-tenant DNNs on NPUs

Eliminating cache contention in shared cache environments

Enhancing shared cache utilization through dynamic allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight architecture for exclusive cache regions

Cache-aware mapping for varying cache capacity

Dynamic allocation algorithm for runtime adjustment

🔎 Similar Papers

No similar papers found.