Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automatic vectorization across diverse architectures remains poorly understood, with limited cross-platform comparability and unclear relationships between vectorization success and actual performance gains. Method: This work systematically evaluates GCC, ICX, and Clang on x86—and GCC, ACFL, and Clang on ARM—using a unified, corrected TSVC2 benchmark. It introduces the first横向 (cross-architectural), multi-compiler, semantically consistent empirical framework, integrating assembly-level verification, cross-platform performance measurement, and statistical modeling. Results: Vectorization enablement rates peak at 54% (x86) and 56% (ARM), yet exhibit weak correlation with speedup: ~20% of vectorized loops yield no performance improvement. Compilers demonstrate high sensitivity to minor code variations and pronounced architecture-specific vectorization decisions. No compiler dominates across all benchmarks on either platform. The study reveals the counterintuitive phenomenon that “vectorization success ≠ acceleration,” challenging common assumptions in compiler optimization and HPC practice.

Technology Category

Application Category

📝 Abstract
Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automat- ically vectorize code is critical to effectively using these units. Understanding this capability is important for anyone writing compute-intensive, high-performance, and portable code. We tested the ability of several compilers to vectorize code on x86 and ARM. We used the TSVC2 suite, with modifications that made it more representative of real-world code. On x86, GCC reported 54% of the loops in the suite as having been vectorized, ICX reported 50%, and Clang, 46%. On ARM, GCC reported 56% of the loops as having been vectorized, ACFL reported 54%, and Clang, 47%. We found that the vectorized code did not always outperform the unvectorized code. In some cases, given two very similar vectorizable loops, a compiler would vectorize one but not the other. We also report cases where a compiler vectorized a loop on only one of the two platforms. Based on our experiments, we cannot definitively say if any one compiler is significantly better than the others at vectorizing code on any given platform.
Problem

Research questions and friction points this paper is trying to address.

Compare compilers' vectorization on x86 and ARM
Assess vectorized code performance impact
Evaluate compiler consistency across platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tested compilers' vectorization on x86 and ARM.
Used modified TSVC2 suite for realistic results.
Compared vectorization performance across multiple compilers.
🔎 Similar Papers
No similar papers found.
N
Nazmus Sakib
Klipsch School of ECE, New Mexico State University, Las Cruces, NM 88003, USA
T
Tarun Prabhu
Los Alamos National Laboratory, Los Alamos, NM 87545, USA
N
N. Santhi
Los Alamos National Laboratory, Los Alamos, NM 87545, USA
J
J. Shalf
Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Abdel-Hameed A. Badawy
Abdel-Hameed A. Badawy
Associate Professor, Klipsch School of Electrical & Computer Engineering, New Mexico State Univ.
Performance Modeling and PredictionHigh Performance ComputingComputer ArchitectureHardware SecurityNetworks on Chip