Efficient Task Graph Scheduling for Parallel QR Factorization in SLSQP

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The SLSQP algorithm suffers from performance bottlenecks due to QR decomposition’s high sensitivity to memory access patterns and intermediate result storage. Method: This paper proposes a state-dependent iterative back-substitution task-graph scheduling method, introducing a novel dual-queue scheduling paradigm that—within DAG-based scheduling—explicitly ensures both accessibility and cross-iteration reuse of intermediate kernels in QR decomposition. A high-order C++ task-graph framework is developed, integrating compiler optimizations, memory-aware scheduling, and fine-grained dependency modeling. Contribution/Results: Experimental evaluation demonstrates a 10× speedup in overall SLSQP convergence time over serial QR implementations, significantly improving parallel efficiency and scalability for nonlinear programming problems.

Technology Category

Application Category

📝 Abstract
Efficient task scheduling is paramount in parallel programming on multi-core architectures, where tasks are fundamental computational units. QR factorization is a critical sub-routine in Sequential Least Squares Quadratic Programming (SLSQP) for solving non-linear programming (NLP) problems. QR factorization decomposes a matrix into an orthogonal matrix Q and an upper triangular matrix R, which are essential for solving systems of linear equations arising from optimization problems. SLSQP uses an in-place version of QR factorization, which requires storing intermediate results for the next steps of the algorithm. Although DAG-based approaches for QR factorization are prevalent in the literature, they often lack control over the intermediate kernel results, providing only the final output matrices Q and R. This limitation is particularly challenging in SLSQP, where intermediate results of QR factorization are crucial for back-substitution logic at each iteration. Our work introduces novel scheduling techniques using a two-queue approach to execute the QR factorization kernel effectively. This approach, implemented in high-level C++ programming language, facilitates compiler optimizations and allows storing intermediate results required by back-substitution logic. Empirical evaluations demonstrate substantial performance gains, including a 10x improvement over the sequential QR version of the SLSQP algorithm.
Problem

Research questions and friction points this paper is trying to address.

Efficient task scheduling for parallel QR factorization in SLSQP.
Control over intermediate QR results for back-substitution logic.
Performance improvement in QR factorization for optimization problems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-queue approach for task scheduling
In-place QR factorization for SLSQP
High-level C++ implementation with optimizations
🔎 Similar Papers
No similar papers found.
Soumyajit Chatterjee
Soumyajit Chatterjee
Senior Research Scientist, Bell Labs and Visiting Scholar, University of Cambridge
Pervasive ComputingApplied Machine Learning
R
Rahul Utkoor
QUALCOMM India Private Limited
U
Uppu Eshwar
Indian Institute of Technology, Hyderabad
Sathya Peri
Sathya Peri
Associate Professor, IIT Hyderabad
Parallel and Distributed Systems
V
V. Nandivada
Indian Institute of Technology, Madras