🤖 AI Summary
This study addresses the high energy consumption of data center cooling systems and the lack of quantification of the efficiency gap between theoretically optimal and practically deployable control strategies. Focusing on the liquid-cooled Frontier supercomputer, the authors develop and validate a Modelica-based digital twin surrogate model (CV-RMSE < 2.7%, NMBE < 2.5%) and propose a hierarchical optimization framework incorporating actuator rate constraints to jointly optimize coolant flow rate and supply temperature. For the first time in a real-world supercomputing system, this work quantifies the gap between theoretical and actual control performance, revealing that coordinated optimization can nearly double energy savings and exposing significant overcooling in the baseline system. The constrained implementation achieves a 27.8% reduction in total energy consumption, with up to 30.1% savings attainable in an unconstrained scenario.
📝 Abstract
Data center cooling systems consume significant auxiliary energy, yet optimization studies rarely quantify the gap between theoretically optimal and operationally deployable control strategies. This paper develops a digital twin of the liquid cooling infrastructure at the Frontier exascale supercomputer, in which a hot-temperature water system comprises three parallel subloops, each serving dedicated coolant distribution unit clusters through plate heat exchangers and variable-speed pumps. The surrogate model is built based on Modelica and validated through one full calendar year of 10-minute operational data following ASHRAE Guideline 14. The model achieves a subloop coefficient of variation of the root mean square error below 2.7% and a normalized mean bias error within 2.5%. Using this validated surrogate model, a layered optimization framework evaluates three progressively constrained strategies: an analytical flow-only optimization achieves 20.4% total energy saving, unconstrained joint optimization of flow rate and supply temperature demonstrates 30.1% total energy saving, and ramp-constrained optimization of flow rate and supply temperature, enforcing actuator rate limits, can reach total energy saving of 27.8%. The analysis reveals that the baseline system operates at 2.9 times the minimum thermally safe flow rate, and the co-optimizing supply temperature with flow rate nearly doubles the savings achievable by flow reduction alone.