🤖 AI Summary
This work addresses the significant discrepancies between microarchitectural simulators and actual RTL implementations, which undermine the reliability of performance prediction. To this end, we propose Microarchitecture Cliffs, a benchmark generation methodology that constructs targeted test cases to enable, for the first time, precise attribution of simulator inaccuracies to individual microarchitectural features, accompanied by an automated calibration pipeline. Focusing on the XiangShan CPU and its gem5 simulation versus RTL implementation, our approach reduces the performance error of XS-GEM5 from 59.2% to 1.4% on Cliff benchmarks. It also decreases absolute errors by 15.1% and 21.0% on SPECint2017 and SPECfp2017, respectively, and cuts relative errors of key microarchitectural features by 48.03%, substantially improving both calibration efficiency and interpretability.
📝 Abstract
Architectural simulators play a critical role in early microarchitectural exploration due to their flexibility and high productivity. However, their effectiveness is often constrained by fidelity: simulators may deviate from the behavior of the final RTL, leading to unreliable performance estimates. Consequently, model calibration, which aligns simulator behavior with the RTL as the ground-truth microarchitecture, becomes essential for achieving accurate performance modeling. To facilitate model calibration accuracy, we propose Microarchitecture Cliffs, a benchmark generation methodology designed to expose mismatches in microarchitectural behavior between the simulator and RTL. After identifying the key architectural components that require calibration, the Cliff methodology enables precise attribution of microarchitectural differences to a single microarchitectural feature through a set of benchmarks. In addition, we develop a set of automated tools to improve the efficiency of the Cliff workflow. We apply the Cliff methodology to calibrate the XiangShan version of gem5 (XS-GEM5) against the XiangShan open-source CPU (XS-RTL). We reduce the performance error of XS-GEM5 from 59.2% to just 1.4% on the Cliff benchmarks. Meanwhile, the calibration guided by Cliffs effectively reduces the relative error of a representative tightly coupled microarchitectural feature by 48.03%. It also substantially lowers the absolute performance error, with reductions of 15.1% and 21.0% on SPECint2017 and SPECfp2017, respectively.