🤖 AI Summary
Current CPU architecture-level power modeling faces two critical bottlenecks: (1) the absence of publicly available, realistic, fine-grained open-source datasets, and (2) the inability of conventional synthetic data generation methods to faithfully replicate real-world CPU design flows. To address these challenges, we introduce CPUDataset—the first open-source dataset specifically designed for modern CPU architecture-level power modeling. It comprises 200 samples across 25 architectural configurations and 8 diverse workloads. Each sample includes over 100 RTL-level architectural features and ground-truth power labels decomposed into four components: combinational logic, sequential logic, memory units, and clock networks. Data generation strictly follows industrial CPU design practices, integrating RTL modeling with reproducible, cycle-accurate power simulation. CPUDataset is publicly released on GitHub. Empirical evaluation demonstrates significantly improved power prediction accuracy during early design stages, establishing the first standardized benchmark for machine learning–driven architecture-level power modeling.
📝 Abstract
Power is the primary design objective of large-scale integrated circuits (ICs), especially for complex modern processors (i.e., CPUs). Accurate CPU power evaluation requires designers to go through the whole time-consuming IC implementation process, easily taking months. At the early design stage (e.g., architecture-level), classical power models are notoriously inaccurate. Recently, ML-based architecture-level power models have been proposed to boost accuracy, but the data availability is a severe challenge. Currently, there is no open-source dataset for this important ML application. A typical dataset generation process involves correct CPU design implementation and repetitive execution of power simulation flows, requiring significant design expertise, engineering effort, and execution time. Even private in-house datasets often fail to reflect realistic CPU design scenarios. In this work, we propose ArchPower, the first open-source dataset for architecture-level processor power modeling. We go through complex and realistic design flows to collect the CPU architectural information as features and the ground-truth simulated power as labels. Our dataset includes 200 CPU data samples, collected from 25 different CPU configurations when executing 8 different workloads. There are more than 100 architectural features in each data sample, including both hardware and event parameters. The label of each sample provides fine-grained power information, including the total design power and the power for each of the 11 components. Each power value is further decomposed into four fine-grained power groups: combinational logic power, sequential logic power, memory power, and clock power. ArchPower is available at https://github.com/hkust-zhiyao/ArchPower.