Comparing Parallel Functional Array Languages: Programming and Performance

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Evaluating the expressiveness and performance trade-offs among parallel functional array languages remains challenging due to fragmented benchmarks and inconsistent evaluation methodologies. Method: This work conducts a systematic, cross-language assessment of five prominent functional array languages—Accelerate, APL, DaCe, Futhark, and SaC—across four representative compute-intensive benchmarks (N-body, MultiGrid, Quickhull, Flash Attention), measuring programming conciseness, cross-platform portability (x86_64/OpenMP and CUDA), and real-world multicore CPU/GPU performance. Experiments employ a 32-core AMD EPYC processor and an NVIDIA A30 GPU. Contribution/Results: To our knowledge, this is the first unified, apples-to-apples comparison of mainstream functional array languages on a common benchmark suite. Results from 39 test configurations demonstrate that the best-performing language achieves hand-optimized baseline performance while reducing code size by over 50%, significantly improving developer productivity and code readability. The study validates the feasibility of single-source, high-performance multi-backend compilation and highlights these languages’ strong adaptability to emerging parallel architectures.

Technology Category

Application Category

📝 Abstract

Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. We systematically compare the designs and implementations of five different functional array languages: Accelerate, APL, DaCe, Futhark, and SaC. We demonstrate the expressiveness of functional array programming by means of four challenging benchmarks, namely N-body simulation, MultiGrid, Quickhull, and Flash Attention. These benchmarks represent a range of application domains and parallel computational models. We argue that the functional array code is much shorter and more comprehensible than the hand-optimized baseline implementations because it omits architecture-specific aspects. Instead, the language implementations generate both multicore and GPU executables from a single source code base. Hence, we further argue that functional array code could more easily be ported to, and optimized for, new parallel architectures than conventional implementations of numerical kernels. We demonstrate this potential by reporting the performance of the five parallel functional array languages on a total of 39 instances of the four benchmarks on both a 32-core AMD EPYC 7313 multicore system and on an NVIDIA A30 GPU. We explore in-depth why each language performs well or not so well on each benchmark and architecture. We argue that the results demonstrate that mature functional array languages have the potential to deliver performance competitive with the best available conventional techniques.

Problem

Research questions and friction points this paper is trying to address.

Compare designs and implementations of five functional array languages

Evaluate expressiveness via four challenging cross-domain benchmarks

Assess performance portability across multicore and GPU architectures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare five functional array languages systematically

Generate multicore and GPU executables from single source

Demonstrate competitive performance with conventional techniques

🔎 Similar Papers

No similar papers found.