🤖 AI Summary
This work addresses the performance degradation of general-purpose cores in heterogeneous SoCs caused by stringent deadline requirements of hardware accelerators sharing the last-level cache. To mitigate this issue, the authors propose HyDRA, a novel dynamic cache management mechanism that, for the first time, models the unique cache reuse behavior of accelerators. HyDRA integrates clustering-based LERN reuse prediction, deadline-aware scheduling, and dynamic bypass decisions to simultaneously meet accelerator deadlines and optimize system throughput. Experimental results demonstrate that HyDRA significantly improves overall performance across diverse workloads and accelerator configurations while substantially reducing deadline miss rates.
📝 Abstract
The system-level cache is a critical resource shared by processor cores and domain-specific accelerators in heterogeneous systems on chips (SoCs). The strict QoS requirements of accelerators, such as deadlines, can lead to severe performance degradation of processor cores. Thus, managing the shared cache efficiently between cores and accelerators becomes crucial. State-of-the-art cache management techniques perform reuse-aware bypassing of accesses from cores with the help of reuse predictors to improve performance. However, architectural differences between accelerators and processor cores (often associated with deep cache hierarchies) can lead to significantly different reuse patterns at the shared cache. We propose a novel clustering-based methodology, LERN, for learning and predicting the reuse behavior of hardware accelerators at the shared cache. We then propose a deadline and reuse-aware cache management strategy, HyDRA, which explores a novel tradeoff between reuse and deadline awareness for performance efficiency. It uses LERN to dynamically predict the reuse behavior of the accelerator accesses and make bypass decisions to maximize the system throughput while meeting accelerator deadlines. We evaluate HyDRA across different workloads and varied accelerator configurations. It significantly improves the system performance and reduces the accelerator deadline miss rate.