Identified-Set Geometry of Distributional Model Extraction under Top-$K$ Censored API Access

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language model APIs often return only the top-K logits, masking the remainder of the vocabulary and thereby hindering full reconstruction of the output distribution. This work investigates the theoretical limits of per-token distribution recoverability under such constraints. By constructing identification sets compatible with the teacher model, we derive the first exact expression for the total variation diameter and establish computable upper and lower bounds under KL divergence. Our theoretical analysis reveals a decoupling between capability extraction and distributional fidelity. Experiments on the Qwen3 mathematical reasoning model show that top-K distillation recovers merely 12% of the teacher’s private capabilities, full-logit distillation achieves 56%, and generative extraction reaches 96%, demonstrating that while top-K masking impedes distribution reconstruction, it does not preclude highly effective capability transfer.

📝 Abstract

Modern LLM APIs often reveal only top-$K$ logit scores and censor the remaining vocabulary. We study the per-position distribution-recovery limits of this access model. For censoring threshold $τ$, the compatible teacher distributions form an identified set whose total-variation diameter is exactly $U_K=(V-K)\exp(τ)/(Z_A+(V-K)\exp(τ))$, where $Z_A$ is the observed partition function. For KL recovery, we give a computable binary-endpoint lower bound and an asymptotically matching small-ambiguity upper bound, with an extension to reference-aware attackers. Experiments on a Qwen3 math-reasoning teacher reveal a layered extraction hierarchy: on-task top-$K$ distillation recovers 12% of private capability, full-logit distillation recovers 56% despite 99% KL closure, and generation-based extraction recovers 96%. Top-$K$ censoring therefore limits per-position distribution recovery but does not by itself prevent capability extraction, separating fidelity from transfer in prompt-only logit distillation.

Problem

Research questions and friction points this paper is trying to address.

top-K censoring

distribution recovery

identified set

LLM APIs

logit extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

top-K censoring

distributional model extraction

identified set