๐ค AI Summary
U.S. academic HPC resources are growing at a compound annual growth rate (CAGR) of only 18%, substantially lagging behind national (43%) and industrial (78%) ratesโrevealing a structural capability gap, particularly for GPU-intensive AI workloads. Method: This study systematically assesses the current state of HPC infrastructure across U.S. universities, benchmarking against DOE leadership-class systems and industrial AI infrastructure through integrated analysis of computational capacity, architectural evolution, governance models, and energy efficiency. Contribution/Results: We propose a novel federated computing framework, a dynamic idle-GPU scheduling mechanism, and a fair cost-allocation model; additionally, we explore a decentralized reinforcement learning paradigm to enhance accessibility and sustainability of campus AI training resources. The findings provide empirically grounded, actionable recommendations for optimizing academic HPC policy and fostering cross-sectoral technological collaboration.
๐ Abstract
The rapid growth of AI, data-intensive science, and digital twin technologies has driven an unprecedented demand for high-performance computing (HPC) across the research ecosystem. While national laboratories and industrial hyperscalers have invested heavily in exascale and GPU-centric architectures, university-operated HPC systems remain comparatively under-resourced. This survey presents a comprehensive assessment of the HPC landscape across U.S. universities, benchmarking their capabilities against Department of Energy (DOE) leadership-class systems and industrial AI infrastructures. We examine over 50 premier research institutions, analyzing compute capacity, architectural design, governance models, and energy efficiency. Our findings reveal that university clusters, though vital for academic research, exhibit significantly lower growth trajectories (CAGR $approx$ 18%) than their national ($approx$ 43%) and industrial ($approx$ 78%) counterparts. The increasing skew toward GPU-dense AI workloads has widened the capability gap, highlighting the need for federated computing, idle-GPU harvesting, and cost-sharing models. We also identify emerging paradigms, such as decentralized reinforcement learning, as promising opportunities for democratizing AI training within campus environments. Ultimately, this work provides actionable insights for academic leaders, funding agencies, and technology partners to ensure more equitable and sustainable HPC access in support of national research priorities.