Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Professional large language models (LLMs) often produce unreliable outputs on out-of-distribution (OOD) inputs, posing safety risks in critical applications. To address this, we propose a post-hoc OOD detection method based on multi-layer dropout tolerance—the first to formalize this tolerance as a non-conformity score within the Inductive Conformal Anomaly Detection (ICAD) framework. Leveraging inherent semantic redundancy and polysemy in LLMs, our approach quantifies response stability via ensemble-based stochastic dropout across multiple transformer layers. Theoretically, it guarantees controllable false positive rates under standard conformal prediction assumptions. Extensive experiments on medical-domain LLMs demonstrate significant improvements: AUROC increases by 2–37 percentage points over state-of-the-art baselines, markedly enhancing OOD detection accuracy while strictly bounding the false positive rate.

Technology Category

Application Category

📝 Abstract

We propose a novel inference-time out-of-domain (OOD) detection algorithm for specialized large language models (LLMs). Despite achieving state-of-the-art performance on in-domain tasks through fine-tuning, specialized LLMs remain vulnerable to incorrect or unreliable outputs when presented with OOD inputs, posing risks in critical applications. Our method leverages the Inductive Conformal Anomaly Detection (ICAD) framework, using a new non-conformity measure based on the model's dropout tolerance. Motivated by recent findings on polysemanticity and redundancy in LLMs, we hypothesize that in-domain inputs exhibit higher dropout tolerance than OOD inputs. We aggregate dropout tolerance across multiple layers via a valid ensemble approach, improving detection while maintaining theoretical false alarm bounds from ICAD. Experiments with medical-specialized LLMs show that our approach detects OOD inputs better than baseline methods, with AUROC improvements of $2%$ to $37%$ when treating OOD datapoints as positives and in-domain test datapoints as negatives.

Problem

Research questions and friction points this paper is trying to address.

Detecting out-of-domain inputs for specialized LLMs

Addressing incorrect outputs from OOD inputs

Improving OOD detection with dropout tolerance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dropout tolerance as non-conformity measure

Ensemble aggregation across multiple layers

Conformal anomaly detection with false alarm bounds

🔎 Similar Papers

Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey