Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This work addresses the challenge of hallucination in black-box large language models (LLMs) and the impracticality of existing uncertainty quantification methods, which often require multiple generations or internal model access. The authors propose Distribution-Aligned Adversarial Distillation (DisAAD), a novel approach that, for the first time, enables uncertainty estimation without accessing the black-box LLM’s internal parameters. DisAAD constructs a generator–discriminator framework to guide a lightweight proxy model—only 1% the size of the target LLM—to learn high-quality regions of the target’s output distribution, combined with evidential learning to estimate response uncertainty. Experiments demonstrate that DisAAD achieves accurate and reliable uncertainty quantification with minimal computational overhead, significantly outperforming existing baselines and offering practical applicability to commercial black-box LLMs accessed via APIs.
📝 Abstract
Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only via APIs. Existing uncertainty quantification methods typically depend on computationally expensive multiple sampling or internal parameters, which prevents real-time estimation and fails to capture information implicit in the black-box reasoning process. To address this issue, we propose Distribution-Aligned Adversarial Distillation (DisAAD), which introduces a generation-discrimination architecture to guide a lightweight proxy model to learn the high-quality regions of the output distribution of the black-box LLM, thus effectively endowing it with the ability to know whether the black-box LLM knows or not. Subsequently, we use the proxy model to reproduce the specific responses of the black-box LLM and estimate the corresponding uncertainty based on evidence learning. Extensive experiments have verified the effectiveness and promise of our proposed method, indicating that a proxy model even one that only accounts for 1\% of the target LLM's size can achieve reliable uncertainty quantification.
Problem

Research questions and friction points this paper is trying to address.

Black-box LLM
Uncertainty Quantification
Hallucination
Real-time Estimation
Proxy Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty Quantification
Black-box LLM
Adversarial Distillation
Proxy Model
Evidence Learning