🤖 AI Summary
Existing power monitoring infrastructure for FPGA-based heterogeneous multicore platforms suffers from lengthy design cycles and poor scalability with increasing processor complexity. Method: This paper proposes an automated power modeling methodology jointly driven by behavioral-level simulation and hardware-measured power traces, eliminating inefficient gate-level simulation for model training. Contribution/Results: We introduce the first integration of high-abstraction behavioral simulation with empirical power trajectories to construct lightweight, transferable runtime power models. Leveraging HLS accelerator integration and an automated modeling framework, our approach enables end-to-end rapid deployment. Evaluated on multi-HLS-accelerator heterogeneous designs, it reduces design time by 18× on average while achieving power estimation accuracy comparable to gate-level methods. The methodology significantly enhances development efficiency and scalability of power monitoring infrastructure for complex FPGA systems.
📝 Abstract
The current over-provisioned heterogeneous multicores require effective run-time optimization strategies, and the run-time power monitoring subsystem is paramount for their success. Several state-of-the-art methodologies address the design of a run-time power monitoring infrastructure for generic computing platforms. However, the power model's training requires time-consuming gate-level simulations that, coupled with the everincreasing complexity of the modern heterogeneous platforms, dramatically hinder the usability of such solutions. This paper introduces Blink, a scalable framework for the fast and automated design of run-time power monitoring infrastructures targeting computing platforms implemented on FPGA. Blink optimizes the time-to-solution to deliver the run-time power monitoring infrastructure by replacing traditional methodologies' gate-level simulations and power trace computations with behavioral simulations and direct power trace measurements. Applying Blink to multiple designs mixing a set of HLS-generated accelerators from a state-of-the-art benchmark suite demonstrates an average time-to-solution speedup of 18 times without affecting the quality of the run-time power estimates.