Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Large language models suffer from hallucination and high inference costs. Method: This paper proposes a Multi-Expert Delayed Decision (MEDD) framework that dynamically routes uncertain samples to stronger experts for improved reliability and simple samples to lightweight models for enhanced efficiency. It introduces the first unified theoretical framework integrating multi-class rejection and multi-expert delayed decision-making; designs a novel surrogate loss family with non-asymptotic H-consistency guarantees; and extends delayed decision to continuous-label regression tasks—previously limited to classification. The method encompasses score-function modeling, joint prediction-rejection optimization, single/dual-stage training, and rigorous H-consistency analysis. Results: Evaluated on CIFAR-10/100 and SVHN, MEDD significantly outperforms baselines. Theoretically, it achieves H-consistency under constant cost. Its regression-delay paradigm unifies and surpasses existing rejection-based approaches.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved remarkable performance but face critical challenges: hallucinations and high inference costs. Leveraging multiple experts offers a solution: deferring uncertain inputs to more capable experts improves reliability, while routing simpler queries to smaller, distilled models enhances efficiency. This motivates the problem of learning with multiple-expert deferral. This thesis presents a comprehensive study of this problem and the related problem of learning with abstention, supported by strong consistency guarantees. First, for learning with abstention (a special case of deferral), we analyze score-based and predictor-rejector formulations in multi-class classification. We introduce new families of surrogate losses and prove strong non-asymptotic, hypothesis set-specific consistency guarantees, resolving two existing open questions. We analyze both single-stage and practical two-stage settings, with experiments on CIFAR-10, CIFAR-100, and SVHN demonstrating the superior performance of our algorithms. Second, we address general multi-expert deferral in classification. We design new surrogate losses for both single-stage and two-stage scenarios and prove they benefit from strong $H$-consistency bounds. For the two-stage scenario, we show that our surrogate losses are realizable $H$-consistent for constant cost functions, leading to effective new algorithms. Finally, we introduce a novel framework for regression with deferral to address continuous label spaces. Our versatile framework accommodates multiple experts and various cost structures, supporting both single-stage and two-stage methods. It subsumes recent work on regression with abstention. We propose new surrogate losses with proven $H$-consistency and demonstrate the empirical effectiveness of the resulting algorithms.

Problem

Research questions and friction points this paper is trying to address.

Develops surrogate losses for multi-class abstention with consistency guarantees

Addresses multi-expert deferral in classification using novel surrogate losses

Introduces a regression framework with deferral for continuous label spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

New surrogate losses for multi-class abstention with consistency guarantees

Multi-expert deferral losses with strong H-consistency bounds

Novel regression deferral framework for continuous label spaces

🔎 Similar Papers

No similar papers found.