Performance Characterization and Optimizations of Traditional ML Applications

📅 2024-12-26

📈 Citations: 0

✨ Influential: 0

career value

267K/year

🤖 AI Summary

Under the rise of deep learning, traditional machine learning (ML) algorithms face performance bottlenecks on large-scale datasets. This paper presents the first fine-grained performance attribution analysis to systematically identify root causes of inefficient cache and main-memory access in mainstream ML libraries such as scikit-learn. We propose two lightweight, scikit-learn–compatible optimizations: (1) a hardware-aware software prefetching mechanism, and (2) a DRAM-locality–driven data layout restructuring and computation reordering strategy. Evaluated on real systems and hardware simulation platforms, prefetching improves execution performance by 5.2%–27.1%, while layout and reordering yield gains of 6.16%–28.0%. These optimizations significantly enhance the scalability and competitiveness of classical ML methods in big-data scenarios, without requiring architectural modifications or user-facing API changes.

Technology Category

Application Category

📝 Abstract

Even in the era of Deep Learning based methods, traditional machine learning methods with large data sets continue to attract significant attention. However, we find an apparent lack of a detailed performance characterization of these methods in the context of large training datasets. In this work, we study the system's behavior of a number of traditional ML methods as implemented in popular free software libraries/modules to identify critical performance bottlenecks experienced by these applications. The performance characterization study reveals several interesting insights on the performance of these applications. Then we evaluate the performance benefits of applying some well-known optimizations at the levels of caches and the main memory. More specifically, we test the usefulness of optimizations such as (i) software prefetching to improve cache performance and (ii) data layout and computation reordering optimizations to improve locality in DRAM accesses. These optimizations are implemented as modifications to the well-known scikit-learn library, and hence can be easily leveraged by application programmers. We evaluate the impact of the proposed optimizations using a combination of simulation and execution on a real system. The software prefetching optimization results in performance benefits varying from 5.2%-27.1% on different ML applications while the data layout and computation reordering approaches yield 6.16%-28.0% performance improvement.

Problem

Research questions and friction points this paper is trying to address.

Big Data

Machine Learning Performance

Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Software Prefetching

Memory Access Efficiency

Optimization Strategies

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Software Engineer, Machine Learning