🤖 AI Summary
Traditional semaphores suffer from poor scalability, high latency, and substantial memory overhead in multithreaded environments due to global spinning. To address these limitations, this paper proposes TWA-semaphore—a novel scalable semaphore that integrates the ticket-lock paradigm with a Waiting Array (WA) mechanism. It is the first work to incorporate a waiting array into semaphore design, enabling thread-local queuing and cache-friendly atomic wakeups. By employing a compact memory layout and a lock-free core path, TWA-semaphore achieves minimal space complexity—O(1) per semaphore—while significantly improving concurrency performance. Experimental evaluation on multicore systems demonstrates that TWA-semaphore outperforms both the Linux kernel semaphore and pthread implementations by multiple-fold in throughput and reduces tail latency by one to two orders of magnitude, thereby overcoming the fundamental scalability bottlenecks inherent in classical ticket-semaphores.
📝 Abstract
Semaphores are a widely used and foundational synchronization and coordination construct used for shared memory multithreaded programming. They are a keystone concept, in the sense that most other synchronization constructs can be implemented in terms of semaphores, although the converse does not generally hold. Semaphores and the quality of their implementation are of consequence as they remain heavily used in the Linux kernel and are also available for application programming via the pthreads programming interface. We first show that semaphores can be implemented by borrowing ideas from the classic ticket lock algorithm. The resulting"ticket-semaphore"algorithm is simple and compact (space efficient) but does not scale well because of the detrimental impact of global spinning. We then transform"ticket-semaphore"into the"TWA-semaphore"by the applying techniques derived from the"TWA - Ticket Locks Augmented with a Waiting Array"algorithm, yielding a scalable semaphore that remains compact and has extremely low latency.