๐ค AI Summary
Streaming graph partitioning often suffers from significantly higher edge cuts than in-memory methods due to its sensitivity to the order of data streams under a single-pass allocation strategy. This work proposes a novel approach that integrates bounded-priority buffering, incremental batch construction with high locality, and multilevel partitioning to delay low-quality assignments and recover local graph structure, thereby substantially improving partition quality. Under adversarial streaming orders, the method reduces edge cuts by 20.8% compared to the strongest baseline while achieving a 2.9ร speedup and an 11.3ร reduction in memory usage; against the next-best alternative, it lowers edge cuts by 15.8% with only minor overhead.
๐ Abstract
Streaming graph partitioners enable resource-efficient and massively scalable partitioning, but one-pass assignment heuristics are highly sensitive to stream order and often yield substantially higher edge cuts than in-memory methods. We present BuffCut, a buffered streaming partitioner that narrows this quality gap, particularly when stream ordering is adversarial, by combining prioritized buffering with batch-wise multilevel assignment. BuffCut maintains a bounded priority buffer to delay poorly informed decisions and regulate the order in which nodes are considered for assignment. It incrementally constructs high-locality batches of configurable size by iteratively inserting the highest-priority nodes from the buffer into the batch, effectively recovering locality structure from the stream. Each batch is then assigned via a multilevel partitioning algorithm. Experiments on diverse real-world and synthetic graphs show that BuffCut consistently outperforms state-of-the-art buffered streaming methods. Compared to the strongest prioritized buffering baseline, BuffCut achieves 20.8% fewer edge cuts while running 2.9 times faster and using 11.3 times less memory. Against the next-best buffered method, it reduces edge cut by 15.8% with only modest overheads of 1.8 times runtime and 1.09 times memory.