🤖 AI Summary
Modern datacenter low-latency services (e.g., Memcached, MySQL) suffer from excessive wake-up latency incurred by deep CPU idle states (C-states), undermining energy efficiency. This paper proposes C6A and its enhanced variant C6AE—a novel deep-idle architecture that breaks the conventional power–latency trade-off. Its three core innovations are: (1) medium-granularity power gating, (2) leakage-optimized, non-power-gated L1/L2 cache retention, and (3) fully digital PLL with always-on clock locking—enabling stable operation at minimum supply voltage. Evaluation shows C6A/C6AE reduces wake-up latency by 900× and cuts idle power to just 7%/5% of active power. For Memcached, system-wide energy consumption drops by up to 71% (35% on average), with end-to-end performance degradation under 1%. These advances significantly improve energy efficiency in latency-critical workloads.
📝 Abstract
User-facing applications running in modern datacenters exhibit irregular request patterns and are implemented using a multitude of services with tight latency requirements (30–250$mu$s). These characteristics render existing energy-conserving techniques ineffective when processors are idle due to the long transition time (order of 100$mu$s) from a deep CPU core idle power state (C-state). While prior works propose management techniques to mitigate this inefficiency, we tackle it at its root with AgileWatts (AW): a new deep CPU core C-state architecture optimized for datacenter server processors targeting latency-sensitive applications.AW drastically reduces the transition latency from deep CPU core idle power states while retaining most of their power savings based on three key ideas. First, AW eliminates the latency (several microseconds) of savinglrestoring the core context when powering-off/-on the core in a deep idle state by i) implementing medium-grained power-gates, carefully distributed across the CPU core, and ii) reraining context in the power-ungated domain. Second, AW eliminates rhe flush latency (several tens of microseconds) of the LllL2 caches when entering a deep idle state by keeping LllL2 content power-ungated. A small control logic also remains ungated to serve cache coherence traffic. AW implements cache sleep-mode and leakage reduction for the power-ungated domain by lowering a core’s voltage to the minimum operational level. Third, using a state-of-the-art power efficient all-digital phase-locked loop (ADPLL) clock generator, AW keeps the PLL active and locked during the idle state, cutting microseconds of wake-up latency at negligible power cost.Our evaluation with an accurate industrial-grade simulator calibrated against an Intel Skylake server shows that AW reduces the energy consumprion of Memcached by up to 71% (35% on average) with<1% end-to-end performance degradation. We observe similar trends for other evaluated services (MySQL and Kafka). AW’s new deep C-states C6A and C6AE reduce transition-time by up to 900$ imes$ as compared to the deepest existing idle state C6, while consuming only 7% and 5% of the active state (C0) power, respectively.