🤖 AI Summary
This paper addresses the large-scale real-time Dynamic Task Allocation Problem (DTAP), aiming to minimize the average cycle time of tasks—particularly in case-based processes with stochastic activity sequences and employee assignment optimization. To overcome the limitation of existing deep reinforcement learning (DRL) approaches, which are restricted to small synthetic scenarios, we propose a general-purpose decision support system. Our method: (i) introduces a graph-structured observation and action space for unified representation of DTAP instances of arbitrary scale; (ii) designs a theoretically proven reward function equivalent to cycle time minimization; and (iii) integrates graph neural networks with process mining techniques to enable realistic scenario modeling. Evaluated on five large-scale instances derived from real business event logs, our approach matches or surpasses all state-of-the-art baselines. Moreover, it demonstrates strong generalization across temporal scales and diverse problem instances.
📝 Abstract
The Dynamic Task Assignment Problem (DTAP) concerns matching resources to tasks in real time while minimizing some objectives, like resource costs or task cycle time. In this work, we consider a DTAP variant where every task is a case composed of a stochastic sequence of activities. The DTAP, in this case, involves the decision of which employee to assign to which activity to process requests as quickly as possible. In recent years, Deep Reinforcement Learning (DRL) has emerged as a promising tool for tackling this DTAP variant, but most research is limited to solving small-scale, synthetic problems, neglecting the challenges posed by real-world use cases. To bridge this gap, this work proposes a DRL-based Decision Support System (DSS) for real-world scale DTAPS. To this end, we introduce a DRL agent with two novel elements: a graph structure for observations and actions that can effectively represent any DTAP and a reward function that is provably equivalent to the objective of minimizing the average cycle time of tasks. The combination of these two novelties allows the agent to learn effective and generalizable assignment policies for real-world scale DTAPs. The proposed DSS is evaluated on five DTAP instances whose parameters are extracted from real-world logs through process mining. The experimental evaluation shows how the proposed DRL agent matches or outperforms the best baseline in all DTAP instances and generalizes on different time horizons and across instances.