Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of hallucination in black-box large language models (LLMs) and the impracticality of existing uncertainty quantification methods, which often require multiple generations or internal model access. The authors propose Distribution-Aligned Adversarial Distillation (DisAAD), a novel approach that, for the first time, enables uncertainty estimation without accessing the black-box LLM’s internal parameters. DisAAD constructs a generator–discriminator framework to guide a lightweight proxy model—only 1% the size of the target LLM—to learn high-quality regions of the target’s output distribution, combined with evidential learning to estimate response uncertainty. Experiments demonstrate that DisAAD achieves accurate and reliable uncertainty quantification with minimal computational overhead, significantly outperforming existing baselines and offering practical applicability to commercial black-box LLMs accessed via APIs.

📝 Abstract

Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only via APIs. Existing uncertainty quantification methods typically depend on computationally expensive multiple sampling or internal parameters, which prevents real-time estimation and fails to capture information implicit in the black-box reasoning process. To address this issue, we propose Distribution-Aligned Adversarial Distillation (DisAAD), which introduces a generation-discrimination architecture to guide a lightweight proxy model to learn the high-quality regions of the output distribution of the black-box LLM, thus effectively endowing it with the ability to know whether the black-box LLM knows or not. Subsequently, we use the proxy model to reproduce the specific responses of the black-box LLM and estimate the corresponding uncertainty based on evidence learning. Extensive experiments have verified the effectiveness and promise of our proposed method, indicating that a proxy model even one that only accounts for 1\% of the target LLM's size can achieve reliable uncertainty quantification.

Problem

Research questions and friction points this paper is trying to address.

Black-box LLM

Uncertainty Quantification

Hallucination

Real-time Estimation

Proxy Model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty Quantification

Black-box LLM

Adversarial Distillation