🤖 AI Summary
In distributed systems, joint scheduling of tasks and data across nodes is challenging, and data hotspots cause severe load imbalance. Method: This paper proposes a task-data co-orchestration abstraction supporting bidirectional task and data migration, coupled with a lightweight distributed push-pull mechanism to achieve low communication overhead and high scalability under highly skewed workloads. The approach integrates distributed task scheduling, dynamic data migration, push-pull–based load balancing, and three execution-flow optimization techniques. Contributions/Results: Experiments show up to 2.7× end-to-end performance improvement over state-of-the-art schedulers. Built upon this framework, the TDO-GP system achieves 4.1× average speedup for general-purpose graph processing, significantly enhancing load-balancing efficiency and system throughput in large-scale graph analytics and key-value store workloads.
📝 Abstract
In this paper, we highlight a task-data orchestration abstraction that supports a range of distributed applications, including graph processing and key-value stores. Given a batch of tasks each requesting one or more data items, where both tasks and data are distributed across multiple machines, each task must get co-located with its target data (by moving tasks and/or data) and executed. We present TD-Orch, an efficient and scalable orchestration framework featuring a simple application developer interface. TD-Orch employs a distributed push-pull technique, leveraging the bidirectional f low of both tasks and data to achieve scalable load balance across machines even under highly skewed data request (data hot spots), with minimal communication overhead. Experimental results show that TD-Orch achieves up to 2.7x speedup over existing distributed scheduling baselines. Building on TD-Orch, we present TDO-GP, a distributed graph processing system for general graph problems, demonstrating the effectiveness of the underlying framework. We design three families of implementation techniques to fully leverage the execution flow provided by TD-Orch. Experimental results show that TDO-GP achieves an average speedup of 4.1x over the best prior open-source distributed graph systems for general graph processing.