Fast Algorithms for Scheduling Many-body Correlation Functions on Accelerators

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High memory consumption and GPU–host data transfer bottlenecks severely hinder the computation of many-body correlation functions in lattice quantum chromodynamics (LQCD). Method: This work proposes a scheduling optimization framework tailored for binary batched tensor contractions, integrating two novel algorithms that jointly exploit contraction-order and tree-structure locality to enhance temporal locality of input/intermediate tensors, maximize memory reuse, and enable fine-grained dataflow control—implemented within the Redstar analysis framework. Contribution/Results: Experiments demonstrate a 2.1× reduction in peak memory usage, a 4.2× decrease in cache evictions, a 1.8× reduction in GPU–host data transfers, and a 1.9× speedup in end-to-end computation. The proposed scheduling paradigm provides a scalable, GPU-accelerated solution for large-scale evaluation of high-order LQCD correlation functions.

Technology Category

Application Category

📝 Abstract
Computation of correlation functions is a key operation in Lattice quantum chromodynamics (LQCD) simulations to extract nuclear physics observables. These functions involve many binary batch tensor contractions, each tensor possibly occupying hundreds of MBs of memory. Performing these contractions on GPU accelerators poses the challenge of scheduling them as to optimize tensor reuse and reduce data traffic. In this work we propose two fast novel scheduling algorithms that reorder contractions to increase temporal locality via input/intermediate tensor reuse. Our schedulers take advantage of application-specific features, such as contractions being binary and locality within contraction trees, to optimize the objective of minimizing peak memory. We integrate them into the LQCD analysis software suite Redstar and improve time-to-solution. Our schedulers attain upto 2.1x improvement in peak memory, which is reflected by a reduction of upto 4.2x in evictions, upto 1.8x in data traffic, resulting in upto 1.9x faster correlation function computation time.
Problem

Research questions and friction points this paper is trying to address.

Optimizing tensor reuse in many-body correlation function computations
Reducing data traffic for binary batch tensor contractions on GPUs
Minimizing peak memory usage in LQCD simulation scheduling algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel scheduling algorithms reorder contractions for temporal locality
Schedulers optimize peak memory using binary contraction trees
Integration into LQCD software reduces data traffic and evictions
🔎 Similar Papers
No similar papers found.
Oguz Selvitopi
Oguz Selvitopi
Lawrence Berkeley National Laboratory
E
Emin Ozturk
School of Computing, University of Utah
J
Jie Chen
Jefferson Lab, Newport News
P
P. Sadayappan
School of Computing, University of Utah
Robert G. Edwards
Robert G. Edwards
Jefferson Lab, Newport News
A
Aydin Buluc
Department of Electrical Engineering and Computer Sciences, University of California