L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work addresses the challenge of instance-level model selection in clinical text classification, where specialized fine-tuned models like ClinicalBERT and general-purpose large language models (LLMs) each offer distinct advantages but lack adaptive coordination. The authors propose L2D-Clinical, a novel framework that introduces a “learning-to-delegate” mechanism to dynamically determine whether to delegate a prediction from a BERT-based classifier to an LLM. Delegation decisions are informed by uncertainty estimates, textual features, and consensus labels derived from multiple LLMs. Evaluated on adverse drug event detection and MIMIC-IV treatment outcome classification tasks, the approach achieves F1 scores of 0.928 (+1.7) and 0.980 (+9.3), respectively, while delegating only 7% and 16.8% of instances—demonstrating high performance with substantially reduced API costs.

Technology Category

Application Category

📝 Abstract
Clinical text classification requires choosing between specialized fine-tuned models (BERT variants) and general-purpose large language models (LLMs), yet neither dominates across all instances. We introduce Learning to Defer for clinical text (L2D-Clinical), a framework that learns when a BERT classifier should defer to an LLM based on uncertainty signals and text characteristics. Unlike prior L2D work that defers to human experts assumed universally superior, our approach enables adaptive deferral-improving accuracy when the LLM complements BERT. We evaluate on two English clinical tasks: (1) ADE detection (ADE Corpus V2), where BioBERT (F1=0.911) outperforms the LLM (F1=0.765), and (2) treatment outcome classification (MIMIC-IV with multi-LLM consensus ground truth), where GPT-5-nano (F1=0.967) outperforms ClinicalBERT (F1=0.887). On ADE, L2D-Clinical achieves F1=0.928 (+1.7 points over BERT) by selectively deferring 7% of instances where the LLM's high recall compensates for BERT's misses. On MIMIC, L2D-Clinical achieves F1=0.980 (+9.3 points over BERT) by deferring only 16.8\% of cases to the LLM. The key insight is that L2D-Clinical learns to selectively leverage LLM strengths while minimizing API costs.
Problem

Research questions and friction points this paper is trying to address.

clinical text classification
model selection
learning to defer
large language models
BERT
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning to Defer
Clinical Text Classification
Model Selection
Large Language Models
Uncertainty-based Routing
🔎 Similar Papers
No similar papers found.