🤖 AI Summary
This work proposes OMPDataPerf, a novel framework that addresses performance bottlenecks in heterogeneous OpenMP applications caused by inefficient data mappings—a problem that existing tools struggle to diagnose automatically. Leveraging the OpenMP Tools Interface (OMPT), OMPDataPerf enables the first automated dynamic detection and profiling of data transfer and allocation patterns in heterogeneous OpenMP programs. By integrating runtime instrumentation, dynamic analysis, and performance modeling, the approach precisely identifies source code locations responsible for suboptimal data mappings, estimates their optimization potential, and delivers actionable recommendations—all with only a 5% geometric mean runtime overhead. This significantly reduces the manual effort traditionally required for performance diagnosis and optimization in heterogeneous parallel programming.
📝 Abstract
With the growing prevalence of heterogeneous computing, CPUs are increasingly being paired with accelerators to achieve new levels of performance and energy efficiency. However, data movement between devices remains a significant bottleneck, complicating application development. Existing performance tools require considerable programmer intervention to diagnose and locate data transfer inefficiencies. To address this, we propose dynamic analysis techniques to detect and profile inefficient data transfer and allocation patterns in heterogeneous applications. We implemented these techniques into OMPDataPerf, which provides detailed traces of problematic data mappings, source code attribution, and assessments of optimization potential in heterogeneous OpenMP applications. OMPDataPerf uses the OpenMP Tools Interface (OMPT) and incurs only a 5 % geometric‑mean runtime overhead.