Performance Characterization and Optimizations of Traditional ML Applications

📅 2024-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Under the rise of deep learning, traditional machine learning (ML) algorithms face performance bottlenecks on large-scale datasets. This paper presents the first fine-grained performance attribution analysis to systematically identify root causes of inefficient cache and main-memory access in mainstream ML libraries such as scikit-learn. We propose two lightweight, scikit-learn–compatible optimizations: (1) a hardware-aware software prefetching mechanism, and (2) a DRAM-locality–driven data layout restructuring and computation reordering strategy. Evaluated on real systems and hardware simulation platforms, prefetching improves execution performance by 5.2%–27.1%, while layout and reordering yield gains of 6.16%–28.0%. These optimizations significantly enhance the scalability and competitiveness of classical ML methods in big-data scenarios, without requiring architectural modifications or user-facing API changes.

Technology Category

Application Category

📝 Abstract
Even in the era of Deep Learning based methods, traditional machine learning methods with large data sets continue to attract significant attention. However, we find an apparent lack of a detailed performance characterization of these methods in the context of large training datasets. In this work, we study the system's behavior of a number of traditional ML methods as implemented in popular free software libraries/modules to identify critical performance bottlenecks experienced by these applications. The performance characterization study reveals several interesting insights on the performance of these applications. Then we evaluate the performance benefits of applying some well-known optimizations at the levels of caches and the main memory. More specifically, we test the usefulness of optimizations such as (i) software prefetching to improve cache performance and (ii) data layout and computation reordering optimizations to improve locality in DRAM accesses. These optimizations are implemented as modifications to the well-known scikit-learn library, and hence can be easily leveraged by application programmers. We evaluate the impact of the proposed optimizations using a combination of simulation and execution on a real system. The software prefetching optimization results in performance benefits varying from 5.2%-27.1% on different ML applications while the data layout and computation reordering approaches yield 6.16%-28.0% performance improvement.
Problem

Research questions and friction points this paper is trying to address.

Big Data
Machine Learning Performance
Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Software Prefetching
Memory Access Efficiency
Optimization Strategies
🔎 Similar Papers
No similar papers found.
Harsh Kumar
Harsh Kumar
PhD Student, University of Toronto
computational social sciencehuman-computer interactionlarge language models
R
R. Govindarajan
Department of Computer Science & Automation, Indian Institute of Science, Bangalore