Learning-to-Defer with Expert-Conditioned Advice

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key limitation of conventional learning-to-rank approaches, which assume a fixed set of information available to experts and thus fail to adapt to modern systems where additional information—such as retrieval results or tool outputs—can be dynamically provided after expert selection. To overcome this, the paper introduces a novel framework that jointly optimizes over a composite action space encompassing both experts and information sources. It proposes, for the first time, a theoretically grounded consistency-enhanced surrogate loss that resolves the inconsistency inherent in existing decoupled losses and recovers the Bayes-optimal policy in limiting cases. Leveraging H-consistency analysis and excess risk bounds, the method enables end-to-end training across diverse settings, including tabular, large language model, and multimodal tasks. Experiments demonstrate significant performance gains over baselines and the ability to adaptively balance information acquisition costs against predictive accuracy.

Technology Category

Application Category

📝 Abstract
Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, are inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, LLMs, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime.
Problem

Research questions and friction points this paper is trying to address.

Learning-to-Defer
expert advice
information acquisition
decision routing
composite action space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning-to-Defer
expert-conditioned advice
joint action space
H-consistency
surrogate loss
🔎 Similar Papers
No similar papers found.
Yannis Montreuil
Yannis Montreuil
PhD Candidate
Machine LearningStatistical LearningHuman-AI collaboration
L
Leina Montreuil
Département de Mathématiques, Sorbonne University
Axel Carlier
Axel Carlier
ISAE-SUPAERO
AIMultimedia
L
Lai Xing Ng
Agency for Science, Technology and Research, Institute for Infocomm Research
Wei Tsang Ooi
Wei Tsang Ooi
National University of Singapore
Multimedia SystemsInteractive SystemsIntelligent Systems