Aggregating Funnels for Faster Fetch&Add and Queues

📅 2024-11-21

🏛️ ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address severe contention and scalability bottlenecks caused by fetch-and-add operations on a single memory location under high concurrency, this paper proposes Aggregating Funnels—a novel mechanism that distributes atomic operations across multiple memory locations to enable cross-location batch aggregation and decoupled result computation. Our approach leverages dual-location coordinated batching, lock-free concurrency control, and fine-grained memory layout optimization, building an efficient aggregation path directly atop hardware-supported fetch-and-add instructions. Unlike conventional single-point or combining funnels, Aggregating Funnels overcomes fundamental scalability limits inherent in prior designs. Experimental evaluation demonstrates significantly higher throughput compared to state-of-the-art Combining Funnels. When integrated into mainstream concurrent queues, it delivers substantial end-to-end performance improvements by eliminating critical serialization bottlenecks.

Technology Category

Application Category

📝 Abstract

Many concurrent algorithms require processes to perform fetch-and-add operations on a single memory location, which can be a hot spot of contention. We present a novel algorithm called Aggregating Funnels that reduces this contention by spreading the fetch-and-add operations across multiple memory locations. It aggregates fetch-and-add operations into batches so that the batch can be performed by a single hardware fetch-and-add instruction on one location and all operations in the batch can efficiently compute their results by performing a fetch-and-add instruction on a different location. We show experimentally that this approach achieves higher throughput than previous combining techniques, such as Combining Funnels, and is substantially more scalable than applying hardware fetch-and-add instructions on a single memory location. We show that replacing the fetch-and-add instructions in the fastest state-of-the-art concurrent queue by our Aggregating Funnels eliminates a bottleneck and greatly improves the queue's overall throughput.

Problem

Research questions and friction points this paper is trying to address.

Reduces contention in fetch-and-add operations

Improves throughput in concurrent algorithms

Enhances scalability of concurrent queues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates fetch-and-add operations into batches

Spreads operations across multiple memory locations

Enhances throughput and scalability of concurrent queues

🔎 Similar Papers

Sorting-based FPGA Sliding Window Aggregation Engine without off-chip Memories