🤖 AI Summary
This study investigates how GPU hardware configurations—specifically NVIDIA A40, A100, L4, and L40—affect performance and energy efficiency in GROMACS-based molecular dynamics (MD) simulations, with emphasis on the impact of graphics clock frequency and power capping. Using realistic biomolecular workloads in GROMACS alongside synthetic benchmarks (Pi Solver, STREAM Triad), we conduct systematic experiments and empirical modeling. Results show that small-scale MD systems are strongly frequency-bound, whereas large-scale systems are memory-bandwidth-limited. Notably, high-end GPUs such as the A100 sustain near-peak performance even under aggressive power reduction (≤50% of TDP), demonstrating exceptional energy-efficiency robustness. The work quantifies fundamental performance–power trade-offs under hardware constraints in MD simulation, providing empirical guidance and novel insights for heterogeneous accelerator selection and energy-aware optimization in computational biophysics.
📝 Abstract
Molecular dynamics simulations are essential tools in computational biophysics, but their performance depend heavily on hardware choices and configuration. In this work, we presents a comprehensive performance analysis of four NVIDIA GPU accelerators -- A40, A100, L4, and L40 -- using six representative GROMACS biomolecular workloads alongside two synthetic benchmarks: Pi Solver (compute bound) and STREAM Triad (memory bound). We investigate how performance scales with GPU graphics clock frequency and how workloads respond to power capping. The two synthetic benchmarks define the extremes of frequency scaling: Pi Solver shows ideal compute scalability, while STREAM Triad reveals memory bandwidth limits -- framing GROMACS's performance in context. Our results reveal distinct frequency scaling behaviors: Smaller GROMACS systems exhibit strong frequency sensitivity, while larger systems saturate quickly, becoming increasingly memory bound. Under power capping, performance remains stable until architecture- and workload-specific thresholds are reached, with high-end GPUs like the A100 maintaining near-maximum performance even under reduced power budgets. Our findings provide practical guidance for selecting GPU hardware and optimizing GROMACS performance for large-scale MD workflows under power constraints.