Towards High-Performance and Portable Molecular Docking on CPUs through Vectorization

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental trade-off between high performance and cross-architecture portability in molecular docking on modern CPUs (x86/ARM) with long-vector units. We propose a systematic source-code transformation strategy that significantly improves compiler auto-vectorization efficiency—achieving performance close to hand-written SIMD implementations—while preserving algorithmic clarity. For the first time, we identify and characterize critical semantic constraints and loop-structure features governing cross-platform vectorization effectiveness. Experimental evaluation shows x86 attains higher peak performance due to wider vector units, whereas ARM delivers superior energy efficiency and cost-effectiveness. Our methodology provides a reusable, compiler-centric pathway toward “write-once, efficiently vectorize-everywhere” for HPC scientific applications. This advances the paradigm of portable, compiler-driven high-performance computing by bridging the gap between abstraction and hardware-specific optimization.

Technology Category

Application Category

📝 Abstract
Recent trends in the HPC field have introduced new CPU architectures with improved vectorization capabilities that require optimization to achieve peak performance and thus pose challenges for performance portability. The deployment of high-performing scientific applications for CPUs requires adapting the codebase and optimizing for performance. Evaluating these applications provides insights into the complex interactions between code, compilers, and hardware. We evaluate compiler auto-vectorization and explicit vectorization to achieve performance portability across modern CPUs with long vectors. We select a molecular docking application as a case study, as it represents computational patterns commonly found across HPC workloads. We report insights into the technical challenges, architectural trends, and optimization strategies relevant to the future development of scientific applications for HPC. Our results show which code transformations enable portable auto-vectorization, reaching performance similar to explicit vectorization. Experimental data confirms that x86 CPUs typically achieve higher execution performance than ARM CPUs, primarily due to their wider vectorization units. However, ARM architectures demonstrate competitive energy consumption and cost-effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Optimizing molecular docking for performance portability across CPUs
Evaluating auto-vectorization versus explicit vectorization techniques
Comparing x86 and ARM CPU performance and energy efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vectorization for CPU performance portability
Compiler auto-vectorization versus explicit vectorization
Molecular docking application case study optimization
🔎 Similar Papers
No similar papers found.