A Lock-Free Work-Stealing Algorithm for Bulk Operations

πŸ“… 2026-03-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the inefficiency of existing general-purpose work-stealing algorithms in specialized parallel solvers, where redundant synchronization overhead hinders effective support for batch operations. Focusing on master-worker mixed-integer programming solvers, the paper proposes a lock-free work-stealing queue tailored to the single-owner–single-thief model, natively supporting unbounded growth and batch operations. By simplifying synchronization mechanisms and ensuring linearizable semantics, the design enables lightweight batch enqueuing and stealing. Experimental results demonstrate that batch push latency remains constant, while steal latency is nearly unaffected by the steal ratio, achieving up to a 3Γ— performance improvement over state-of-the-art frameworks such as C++ Taskflow.

Technology Category

Application Category

πŸ“ Abstract
Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability. However, existing algorithms are designed for general-purpose parallel runtimes and often incur overheads that are unnecessary in specialized settings. In this paper, we present a new lock-free work-stealing queue tailored for a master-worker framework used in the parallelization of a mixed-integer programming optimization solver based on decision diagrams. Our design supports native bulk operations, grows without bounds, and assumes at most one owner and one concurrent stealer, thereby eliminating the need for heavy synchronization. We provide an informal sketch that our queue is linearizable and lock-free under this restricted concurrency model. Benchmarks demonstrate that our implementation achieves constant-latency push performance, remaining stable even as batch size increases, in contrast to existing queues from C++ Taskflow whose latencies grow sharply with batch size. Pop operations perform comparably across all implementations, while our steal operation maintains nearly flat latency across different steal proportions. We also explore an optimized steal variant that reduces latency by up to 3x in practice. Finally, a pseudo workload based on large-graph exploration confirms that all implementations scale linearly. However, we argue that solver workloads with irregular node processing times would further amplify the advantages of our algorithm.
Problem

Research questions and friction points this paper is trying to address.

work-stealing
bulk operations
lock-free
irregular workloads
parallel runtime
Innovation

Methods, ideas, or system contributions that make the work stand out.

lock-free
work-stealing
bulk operations
master-worker
linearizable
πŸ”Ž Similar Papers
No similar papers found.
R
Raja Sai Nandhan Yadav Kataru
Department of Computer Science, Iowa State University
D
Danial Davarnia
Edwardson School of Industrial Engineering, Purdue University
Ali Jannesari
Ali Jannesari
Associate Professor, Iowa State University
high-performance computingmachine learningparallel computingsoftware analytics