RCOMPSs: A Scalable Runtime System for R Code Execution on Manycore Systems

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
R’s native parallelism is insufficient for efficiently leveraging multi-core and many-core systems in large-scale data analytics. This paper introduces the first lightweight, runtime-aware task-parallel framework for R that requires no modifications to R semantics or user code. It achieves automatic asynchronous execution, cross-node resource scheduling, and hybrid shared/distributed memory scaling via dynamic task extraction and dependency graph construction. Built upon the COMPSs programming model, the framework integrates R language bindings, adaptive scheduling policies, and deep optimizations for HPC platforms—including Shaheen-III and MareNostrum 5. Experimental evaluation demonstrates excellent strong and weak scalability up to 4,096 cores (128 cores/node × 32 nodes); parallel efficiency exceeds 70% for KNN and K-means; and even for complex dependency patterns—as in distributed linear regression—the framework sustains practical performance.

Technology Category

Application Category

📝 Abstract
R has become a cornerstone of scientific and statistical computing due to its extensive package ecosystem, expressive syntax, and strong support for reproducible analysis. However, as data sizes and computational demands grow, native R parallelism support remains limited. This paper presents RCOMPSs, a scalable runtime system that enables efficient parallel execution of R applications on multicore and manycore systems. RCOMPSs adopts a dynamic, task-based programming model, allowing users to write code in a sequential style, while the runtime automatically handles asynchronous task execution, dependency tracking, and scheduling across available resources. We present RCOMPSs using three representative data analysis algorithms, i.e., K-nearest neighbors (KNN) classification, K-means clustering, and linear regression and evaluate their performance on two modern HPC systems: KAUST Shaheen-III and Barcelona Supercomputing Center (BSC) MareNostrum 5. Experimental results reveal that RCOMPSs demonstrates both strong and weak scalability on up to 128 cores per node and across 32 nodes. For KNN and K-means, parallel efficiency remains above 70% in most settings, while linear regression maintains acceptable performance under shared and distributed memory configurations despite its deeper task dependencies. Overall, RCOMPSs significantly enhances the parallel capabilities of R with minimal, automated, and runtime-aware user intervention, making it a practical solution for large-scale data analytics in high-performance environments.
Problem

Research questions and friction points this paper is trying to address.

Enables parallel R execution on multicore/manycore systems
Automates task scheduling and dependency management
Improves scalability for large-scale data analytics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic task-based programming model for R
Automatic asynchronous task execution and scheduling
Scalable runtime for multicore and manycore systems
🔎 Similar Papers
No similar papers found.