A Lock-Free Work-Stealing Algorithm for Bulk Operations

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the inefficiency of existing general-purpose work-stealing algorithms in specialized parallel solvers, where redundant synchronization overhead hinders effective support for batch operations. Focusing on master-worker mixed-integer programming solvers, the paper proposes a lock-free work-stealing queue tailored to the single-owner–single-thief model, natively supporting unbounded growth and batch operations. By simplifying synchronization mechanisms and ensuring linearizable semantics, the design enables lightweight batch enqueuing and stealing. Experimental results demonstrate that batch push latency remains constant, while steal latency is nearly unaffected by the steal ratio, achieving up to a 3× performance improvement over state-of-the-art frameworks such as C++ Taskflow.

Technology Category

Application Category

📝 Abstract

Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability. However, existing algorithms are designed for general-purpose parallel runtimes and often incur overheads that are unnecessary in specialized settings. In this paper, we present a new lock-free work-stealing queue tailored for a master-worker framework used in the parallelization of a mixed-integer programming optimization solver based on decision diagrams. Our design supports native bulk operations, grows without bounds, and assumes at most one owner and one concurrent stealer, thereby eliminating the need for heavy synchronization. We provide an informal sketch that our queue is linearizable and lock-free under this restricted concurrency model. Benchmarks demonstrate that our implementation achieves constant-latency push performance, remaining stable even as batch size increases, in contrast to existing queues from C++ Taskflow whose latencies grow sharply with batch size. Pop operations perform comparably across all implementations, while our steal operation maintains nearly flat latency across different steal proportions. We also explore an optimized steal variant that reduces latency by up to 3x in practice. Finally, a pseudo workload based on large-graph exploration confirms that all implementations scale linearly. However, we argue that solver workloads with irregular node processing times would further amplify the advantages of our algorithm.

Problem

Research questions and friction points this paper is trying to address.

work-stealing

bulk operations

lock-free

irregular workloads

parallel runtime

Innovation

Methods, ideas, or system contributions that make the work stand out.

lock-free

work-stealing

bulk operations