ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of data movement and performance tuning in OpenMP GPU offloading by proposing a Codex-based autonomous coding agent. The agent employs a domain-knowledge-guided workflow that integrates hotspot analysis, explicit data layout planning, correctness gating, and performance-profile-driven iterative optimization to automatically translate sequential CPU code into efficient and reliable OpenMP GPU offload code. Evaluated on 31 kernels, the approach successfully generates runnable code for all cases, with 25 outperforming reference implementations. It achieves geometric mean speedups of 3× and 5× on the HeCBench and Rodinia benchmarks, respectively, and demonstrates high compilation and validation success rates in CUDA-to-OpenMP migration tasks.

Technology Category

Application Category

📝 Abstract
Parallel programming is central to HPC and AI, but producing code that is correct and fast remains challenging, especially for OpenMP GPU offload, where data movement and tuning dominate. Autonomous coding agents can compile, test, and profile on target hardware, but outputs are brittle without domain scaffolding. We present ParaCodex, an HPC-engineer workflow that turns a Codex-based agent into an autonomous OpenMP GPU offload system using staged hotspot analysis, explicit data planning, correctness gating, and profiling-guided refinement. We evaluate translation from serial CPU kernels to OpenMP GPU offload kernels on HeCBench, Rodinia, and NAS. After excluding five kernels, ParaCodex succeeded on all 31 valid kernels. The generated kernels improved GPU time over reference OpenMP implementations in 25/31 cases, achieving geometric-mean speedups of 3x on HeCBench and 5x on Rodinia, and outperforming a zero-shot Codex baseline on all suites. We also evaluate CUDA to OpenMP offload translation on ParEval, where ParaCodex maintains high compilation and validation rates in code-only and end-to-end settings.
Problem

Research questions and friction points this paper is trying to address.

parallel programming
OpenMP GPU offload
code correctness
performance optimization
autonomous coding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Profiling-Guided Code Generation
Autonomous Coding Agent
OpenMP GPU Offload
Correctness Gating
Data Movement Planning
🔎 Similar Papers
No similar papers found.
E
Erel Kaplan
Technion – Israel Institute of Technology, Haifa, Israel
T
Tomer Bitan
Technion – Israel Institute of Technology, Haifa, Israel
L
Lian Ghrayeb
Technion – Israel Institute of Technology, Haifa, Israel
Le Chen
Le Chen
Argonne National Laboratory
LLMHPCML4CodeAI4Codeautomatic parallelization
T
Tom Yotam
Code Metal, USA
Niranjan Hasabnis
Niranjan Hasabnis
Principal Research Scientist, Code Metal
ML for systemsCompilersSoftware engineeringDebuggingCode transpilation
Gal Oren
Gal Oren
Visiting Scholar, Stanford | Assistant Professor of CS, Technion
Scientific ComputingArtificial IntelligenceHPC