DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the device allocation problem for asynchronous dataflow graphs under work-conserving systems, aiming to minimize execution time for complex machine learning workloads. Existing learning-based approaches are constrained by synchronous barriers, neglect underlying scheduling semantics, and over-rely on end-to-end reinforcement learning while ignoring expert-derived heuristics. To overcome these limitations, we propose SEL-PLC—the first dual-policy collaborative learning framework: the SEL module performs operation-level fine-grained selection via reinforcement learning, while the PLC module handles device-level placement; the two modules are decoupled and explicitly aligned with asynchronous scheduling semantics. Crucially, SEL-PLC incorporates expert heuristics as structured priors, eliminating reliance on synchronous assumptions and bridging the scheduling-awareness gap. Experiments demonstrate that SEL-PLC significantly reduces execution time across diverse multi-task workloads, improves sampling efficiency, and shortens per-episode training time—outperforming all baselines comprehensively.

Technology Category

Application Category

📝 Abstract
We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose extsc{Doppler}, a three-stage framework for training dual-policy networks consisting of 1) a $mathsf{SEL}$ policy for selecting operations and 2) a $mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that extsc{Doppler} outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time.
Problem

Research questions and friction points this paper is trying to address.

Assigning operations in dataflow graphs to minimize execution time
Overcoming limitations of bulk-synchronous systems like TensorFlow
Improving device utilization and scheduling awareness in ML workloads
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-policy learning for device assignment
Three-stage training framework
Combines SEL and PLC policies
X
Xinyu Yao
Rice University
D
Daniel Bourgeois
Rice University
A
Abhinav Jain
Rice University
Y
Yuxin Tang
Rice University
Jiawen Yao
Jiawen Yao
Alibaba DAMO Academy
Medical Image AnalysisSignal ProcessingDeep Learning
Z
Zhimin Ding
Rice University
Arlei Silva
Arlei Silva
Rice University
Data MiningAlgorithmsMachine LearningData ScienceNetwork Science
C
Christopher Jermaine
Rice University