🤖 AI Summary
This work addresses the challenges of solving large-scale combinatorial optimization problems under uncertainty, where exact solutions are intractable and existing imitation learning approaches lack a unified framework for modeling expert policies. The paper introduces the first three-dimensional taxonomy for characterizing expert strategies in combinatorial optimization and develops a generalized DAgger framework that supports multi-expert aggregation and interactive learning. By integrating sequential decision modeling, stochastic optimization, and an enhanced imitation learning mechanism, the method flexibly accommodates two-stage or multi-stage stochastic experts. Empirical results on a dynamic physician–patient assignment task demonstrate that policies learned from stochastic experts significantly outperform deterministic or full-information baselines; interactive learning achieves higher solution quality with fewer demonstrations, while aggregating deterministic experts yields superior performance under computational constraints.
📝 Abstract
Imitation learning (IL) provides a data-driven framework for approximating policies for large-scale combinatorial optimisation problems formulated as sequential decision problems (SDPs), where exact solution methods are computationally intractable. A central but underexplored aspect of IL in this context is the role of the \emph{expert} that generates training demonstrations. Existing studies employ a wide range of expert constructions, yet lack a unifying framework to characterise their modelling assumptions, computational properties, and impact on learning performance. This paper introduces a systematic taxonomy of experts for IL in combinatorial optimisation under uncertainty. Experts are classified along three dimensions: (i) their treatment of uncertainty, including myopic, deterministic, full-information, two-stage stochastic, and multi-stage stochastic formulations; (ii) their level of optimality, distinguishing task-optimal and approximate experts; and (iii) their interaction mode with the learner, ranging from one-shot supervision to iterative, interactive schemes. Building on this taxonomy, we propose a generalised Dataset Aggregation (DAgger) algorithm that supports multiple expert queries, expert aggregation, and flexible interaction strategies. The proposed framework is evaluated on a dynamic physician-to-patient assignment problem with stochastic arrivals and capacity constraints. Computational experiments compare learning outcomes across expert types and interaction regimes. The results show that policies learned from stochastic experts consistently outperform those learned from deterministic or full-information experts, while interactive learning improves solution quality using fewer expert demonstrations. Aggregated deterministic experts provide an effective alternative when stochastic optimisation becomes computationally challenging.