When More Experts Hurt: Underfitting in Multi-Expert Learning to Defer

📅 2026-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inherent underfitting problem in multi-expert learning caused by expert non-identifiability, which severely limits model performance. It is the first to reveal that this issue stems from the non-uniqueness of expert identities in label-to-doubt (L2D) tasks under multi-expert settings. To resolve this, the authors propose PiCCE, a novel method that leverages a surrogate loss integrating empirical confidence and correctness to dynamically identify the optimal expert, thereby reducing multi-expert learning to a single-expert-like framework. Theoretical analysis establishes the consistency and probabilistic recoverability of the proposed approach. Extensive experiments across diverse real-world and simulated expert scenarios demonstrate that PiCCE significantly enhances both predictive accuracy and training stability.

Technology Category

Application Category

📝 Abstract
Learning to Defer (L2D) enables a classifier to abstain from predictions and defer to an expert, and has recently been extended to multi-expert settings. In this work, we show that multi-expert L2D is fundamentally more challenging than the single-expert case. With multiple experts, the classifier's underfitting becomes inherent, which seriously degrades prediction performance, whereas in the single-expert setting it arises only under specific conditions. We theoretically reveal that this stems from an intrinsic expert identifiability issue: learning which expert to trust from a diverse pool, a problem absent in the single-expert case and renders existing underfitting remedies failed. To tackle this issue, we propose PiCCE (Pick the Confident and Correct Expert), a surrogate-based method that adaptively identifies a reliable expert based on empirical evidence. PiCCE effectively reduces multi-expert L2D to a single-expert-like learning problem, thereby resolving multi expert underfitting. We further prove its statistical consistency and ability to recover class probabilities and expert accuracies. Extensive experiments across diverse settings, including real-world expert scenarios, validate our theoretical results and demonstrate improved performance.
Problem

Research questions and friction points this paper is trying to address.

multi-expert learning
learning to defer
underfitting
expert identifiability
classifier abstention
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-expert learning to defer
underfitting
expert identifiability
PiCCE
statistical consistency
🔎 Similar Papers
No similar papers found.