🤖 AI Summary
This work addresses a key limitation of traditional Learning-to-Defer approaches, which assume fixed expert information and thus fail to handle real-world scenarios requiring dynamic acquisition of auxiliary inputs—such as retrieved evidence. To overcome this, the authors propose a Learning-to-Defer with Advice framework that jointly optimizes routing and advice generation by constructing an augmented surrogate loss over a composite expert–advice action space, thereby approximating the Bayes-optimal policy. Theoretical analysis reveals the inconsistency of decoupled surrogate losses even in the simplest nontrivial setting and introduces, for the first time, an augmented method with H-consistency guarantees, enabling excess risk transfer and recovery of Bayes optimality. Experiments across diverse tasks—including tabular, language, and multimodal settings—demonstrate substantial improvements over existing methods, with adaptive advice-seeking behavior aligned to cost mechanisms, while synthetic benchmarks confirm the failure of decoupled approaches.
📝 Abstract
Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.