🤖 AI Summary
To address the challenge of predicting program execution time during early software development—when full binary execution is infeasible—this paper proposes a lightweight, execution-free timing prediction method. Our approach constructs an LLVM-based, hardware-aware simulation environment at the IR level to extract fine-grained, timing-sensitive features—including instruction counts, cache accesses, and branch misprediction estimates—and integrates historical performance data to train supervised learning models (e.g., XGBoost and Random Forest). Crucially, we introduce the first deep coupling of LLVM IR-level execution semantics with hardware behavior modeling (cache and branch prediction), overcoming key limitations of prior methods reliant solely on source-code static analysis or post-execution profiling. Evaluated on public benchmarks, our method achieves a mean absolute percentage error (MAPE) of 11.98%, outperforming state-of-the-art approaches; prediction time remains at the minute scale, with negligible overhead.
📝 Abstract
We introduce preti, a novel framework for predicting software execution time during the early stages of development. preti leverages an LLVM-based simulation environment to extract timing-related runtime information, such as the count of executed LLVM IR instructions. This information, combined with historical execution time data, is utilized to train machine learning models for accurate time prediction. To further enhance prediction accuracy, our approach incorporates simulations of cache accesses and branch prediction. The evaluations on public benchmarks demonstrate that preti achieves an average Absolute Percentage Error (APE) of 11.98%, surpassing state-of-the-art methods. These results underscore the effectiveness and efficiency of preti as a robust solution for early-stage timing analysis.