🤖 AI Summary
To address the insufficient reliability of single-expert learning-to-defer (L2D) in high-stakes scenarios, this paper proposes a multi-expert collaborative Top-k L2D framework. First, it employs a two-stage mechanism to dynamically select the *k* most confident agents for joint query response. Second, it introduces an input-adaptive Top-*k*(x) strategy that automatically determines the optimal number of experts per instance based on sample complexity and deferral cost. Theoretically, we formulate the first Bayes-consistent and (*R*, *G*)-consistent surrogate loss for agent-level deferral, proving that model cascades are merely a special case of our framework. Methodologically, we integrate multi-agent confidence modeling with joint cost- and capability-aware decision-making. Experiments on multi-class and regression benchmarks demonstrate that our approach significantly outperforms single-expert L2D, achieving superior trade-offs between reliability and cost efficiency.
📝 Abstract
Learning-to-Defer (L2D) enables decision-making systems to improve reliability by selectively deferring uncertain predictions to more competent agents. However, most existing approaches focus exclusively on single-agent deferral, which is often inadequate in high-stakes scenarios that require collective expertise. We propose Top-$k$ Learning-to-Defer, a generalization of the classical two-stage L2D framework that allocates each query to the $k$ most confident agents instead of a single one. To further enhance flexibility and cost-efficiency, we introduce Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of agents to consult for each query, based on input complexity, agent competency distributions, and consultation costs. For both settings, we derive a novel surrogate loss and prove that it is Bayes-consistent and $(mathcal{R}, mathcal{G})$-consistent, ensuring convergence to the Bayes-optimal allocation. Notably, we show that the well-established model cascades paradigm arises as a restricted instance of our Top-$k$ and Top-$k(x)$ formulations. Extensive experiments across diverse benchmarks demonstrate the effectiveness of our framework on both classification and regression tasks.